Message ID | 20090427082914.GA383@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Christoph Hellwig wrote: > [had the qemu list address wrong the first time, reply to this message, > not the previous if you were on Cc] > > > Add support for SG_IO passthru (packet commands) to the virtio-blk > backend. Conceptually based on an older patch from Hannes Reinecke > but largely rewritten to match the code structure and layering in > virtio-blk. > > Note that currently we issue the hose SG_IO synchronously. We could > easily switch to async I/O, but that would required either bloating > the VirtIOBlockReq by the size of struct sg_io_hdr or an additional > memory allocation for each SG_IO request. > I think that's worthwhile. The extra bloat is trivial (especially as the number of inflight virtio requests is tightly bounded), and stalling the vcpu for requests is a pain.
Christoph Hellwig wrote: > [had the qemu list address wrong the first time, reply to this message, > not the previous if you were on Cc] > > > Add support for SG_IO passthru (packet commands) to the virtio-blk > backend. Conceptually based on an older patch from Hannes Reinecke > but largely rewritten to match the code structure and layering in > virtio-blk. > > Note that currently we issue the hose SG_IO synchronously. We could > easily switch to async I/O, but that would required either bloating > the VirtIOBlockReq by the size of struct sg_io_hdr or an additional > memory allocation for each SG_IO request. > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > So practically speaking, what can you do with this? Should we be handling some SCSI cmds internally to QEMU (like eject operations) and supporting media=cdrom in -drive for if=virtio? On a related topic, should we switch /dev/vdX to be /dev/sdX? Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 27, 2009 at 12:15:31PM +0300, Avi Kivity wrote: > I think that's worthwhile. The extra bloat is trivial (especially as > the number of inflight virtio requests is tightly bounded), and stalling > the vcpu for requests is a pain. Ok, new patch will follow ASAP. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 27, 2009 at 09:36:51AM -0500, Anthony Liguori wrote: > So practically speaking, what can you do with this? It's interesting if your underlying device is a real scsi disk and you want virtio for efficiency but also allow to issue scsi commands on the underlying device, e.g. for finding out more information about the underlying setup using the sg_* tools or for certain manual multipathing setups, or for upgrading firmware or.. > Should we be > handling some SCSI cmds internally to QEMU (like eject operations) and > supporting media=cdrom in -drive for if=virtio? Not quite yet. Eventually I want to support a virtio-scsi kind of transport which would use the same virtio-blk protocol but only send scsi commands. We'd need a different driver on the Linux side for it that registers to the scsi layer. On the QEMU side it could either do pass-through to a real scsi device using SG_IO or use the existing command scsi emulator in scsi-disk.c. > On a related topic, should we switch /dev/vdX to be /dev/sdX? We can't and don't want for the current virtio-blk driver. Once we implement a virtio-scsi driver in the guest disk will show up as /dev/sdX. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig wrote: >> Should we be >> handling some SCSI cmds internally to QEMU (like eject operations) and >> supporting media=cdrom in -drive for if=virtio? >> > > Not quite yet. Eventually I want to support a virtio-scsi kind of > transport which would use the same virtio-blk protocol but only send > scsi commands. We'd need a different driver on the Linux side for > it that registers to the scsi layer. On the QEMU side it could either > do pass-through to a real scsi device using SG_IO or use the existing > command scsi emulator in scsi-disk.c. > Ah, excellent. I think that's a great thing to do. So do you think virtio-scsi would deprecate virtio-blk? I've gotten a number of requests for exposing stuff like VPD through virtio-blk. There's also a surprising amount of software out there that assumes /dev/sdX or /dev/hdX naming. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Tuesday 28 April 2009 18:37:01 schrieb Anthony Liguori: > Christoph Hellwig wrote: > >> Should we be > >> handling some SCSI cmds internally to QEMU (like eject operations) and > >> supporting media=cdrom in -drive for if=virtio? > >> > > > > Not quite yet. Eventually I want to support a virtio-scsi kind of > > transport which would use the same virtio-blk protocol but only send > > scsi commands. We'd need a different driver on the Linux side for > > it that registers to the scsi layer. On the QEMU side it could either > > do pass-through to a real scsi device using SG_IO or use the existing > > command scsi emulator in scsi-disk.c. > > > > Ah, excellent. I think that's a great thing to do. So do you think > virtio-scsi would deprecate virtio-blk? There are lots of other block device drivers with strange characteristics. For example dasd/xpram disks with 4k block size or other disks with strange partitioning schemes. Doing a scsi emulation on top of these beasts looks possible but far from ideal. So I guess we will need virtio_blk for non-scsi block devices. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Tuesday 28 April 2009 11:51:54 schrieb Christoph Hellwig: > Not quite yet. Eventually I want to support a virtio-scsi kind of > transport which would use the same virtio-blk protocol but only send > scsi commands. We'd need a different driver on the Linux side for > it that registers to the scsi layer. On the QEMU side it could either > do pass-through to a real scsi device using SG_IO or use the existing Yes, virtio-scsi is also something we were thinking of. The last time we discussed this idea of SCSI passthrough internally, we stumbled over error recovery. Who is responsible for the error recovery? The host, the guest, or both? Are there problems, which will trigger error recovery in the guest and host midlayer at the same time? To be honest, my scsi knowledge is very limited, so I dont know if that is a real problem. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote: > Ah, excellent. I think that's a great thing to do. So do you think > virtio-scsi would deprecate virtio-blk? I don't think so. If you have an image format or a non-scsi blockdevice underneath virtio-block avoids the encoding into SCSI CDBs and back and should be faster. > I've gotten a number of requests for exposing stuff like VPD through > virtio-blk. There's also a surprising amount of software out there that > assumes /dev/sdX or /dev/hdX naming. They should be fixed for a while, we have lots of non-scsi, non-ide block devices. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 28, 2009 at 09:09:52PM +0200, Christian Borntraeger wrote:
> Yes, virtio-scsi is also something we were thinking of. The last time we discussed this idea of SCSI passthrough internally, we stumbled over error recovery. Who is responsible for the error recovery? The host, the guest, or both? Are there problems, which will trigger error recovery in the guest and host midlayer at the same time? To be honest, my scsi knowledge is very limited, so I dont know if that is a real problem.
I'm not that far with planning yet. The best way would be to do real
error handling in the host probably.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Wednesday 29 April 2009 12:50:04 schrieb Christoph Hellwig: > On Tue, Apr 28, 2009 at 09:09:52PM +0200, Christian Borntraeger wrote: > > Yes, virtio-scsi is also something we were thinking of. The last time we discussed this idea of SCSI passthrough internally, we stumbled over error recovery. Who is responsible for the error recovery? The host, the guest, or both? Are there problems, which will trigger error recovery in the guest and host midlayer at the same time? To be honest, my scsi knowledge is very limited, so I dont know if that is a real problem. > > I'm not that far with planning yet. The best way would be to do real > error handling in the host probably. Yes, that was my impression too. The tricky part is the timeout based recovery found in the scsi midlayer (e.g. scsi-error.c). In case of a timeout the guest and the host will start recovery at the same time. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 29 April 2009, Christoph Hellwig wrote: > On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote: > > Ah, excellent. I think that's a great thing to do. So do you think > > virtio-scsi would deprecate virtio-blk? > > I don't think so. If you have an image format or a non-scsi blockdevice > underneath virtio-block avoids the encoding into SCSI CDBs and back and > should be faster. Is this actually measurably faster, or just infinitesimally faster in theory? I can maybe see that virtio-blk is slightly simpler for dumb drivers, though even then a basic scsi host is pretty straightforward and I find it hard to believe there's much real benefit. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Wednesday 29 April 2009 13:11:19 schrieb Paul Brook: > On Wednesday 29 April 2009, Christoph Hellwig wrote: > > On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote: > > > Ah, excellent. I think that's a great thing to do. So do you think > > > virtio-scsi would deprecate virtio-blk? > > > > I don't think so. If you have an image format or a non-scsi blockdevice > > underneath virtio-block avoids the encoding into SCSI CDBs and back and > > should be faster. > > Is this actually measurably faster, or just infinitesimally faster in > theory? If the underlying backing is 4k block size and the emulated scsi disk has 512 byte block size, write performance can be a lot slower. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2009 at 12:11:19PM +0100, Paul Brook wrote:
> Is this actually measurably faster, or just infinitesimally faster in theory?
On normal disks it's rather theoretical. On high-end SSDs and arrays the
impact is noticeable, mostly due to the additional latency.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 29 April 2009, Christian Borntraeger wrote: > Am Wednesday 29 April 2009 13:11:19 schrieb Paul Brook: > > On Wednesday 29 April 2009, Christoph Hellwig wrote: > > > On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote: > > > > Ah, excellent. I think that's a great thing to do. So do you think > > > > virtio-scsi would deprecate virtio-blk? > > > > > > I don't think so. If you have an image format or a non-scsi > > > blockdevice underneath virtio-block avoids the encoding into SCSI CDBs > > > and back and should be faster. > > > > Is this actually measurably faster, or just infinitesimally faster in > > theory? > > If the underlying backing is 4k block size and the emulated scsi disk has > 512 byte block size, write performance can be a lot slower. Rubbish. If this is really a problem you can just use a scsi disk with 4k sectors. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 29 April 2009, Christoph Hellwig wrote: > On Wed, Apr 29, 2009 at 12:11:19PM +0100, Paul Brook wrote: > > Is this actually measurably faster, or just infinitesimally faster in > > theory? > > On normal disks it's rather theoretical. On high-end SSDs and arrays the > impact is noticeable, mostly due to the additional latency. How exactly does it introduce additional latency? A scsi command block is hardly large or complicated. Are you suggesting that a 16/32byte scsi command takes significantly longer to process than a 16byte virtio command descriptor? I'd expect any extra processing to be a small fraction of the host syscall latency, let alone the latency of the physical host adapter. It probably even fits on the same CPU cache line. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2009 at 12:37:20PM +0100, Paul Brook wrote: > How exactly does it introduce additional latency? A scsi command block is > hardly large or complicated. Are you suggesting that a 16/32byte scsi command > takes significantly longer to process than a 16byte virtio command > descriptor? I'd expect any extra processing to be a small fraction of the > host syscall latency, let alone the latency of the physical host adapter. It > probably even fits on the same CPU cache line. Encoding the scsi CDB is additional work but I would be surprised it it is mesurable. Just using scsi cdbs would be simple enough, the bigger issue is emulating a full blown scsi bus because then you need to do all kinds queueing decisions at target levels etc and drag in a complicated scsi stack and not just a simple block driver in the guest. And at least on current linux kernels that does introduce mesurable latency. Now it might be possible to get that latency down to a level where we can ignore it but when doing all this additional work there always will be additional overhead. > > Paul ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2009-04-30 at 22:13 +0200, Christoph Hellwig wrote: > On Wed, Apr 29, 2009 at 12:37:20PM +0100, Paul Brook wrote: > > How exactly does it introduce additional latency? A scsi command block is > > hardly large or complicated. Are you suggesting that a 16/32byte scsi command > > takes significantly longer to process than a 16byte virtio command > > descriptor? I'd expect any extra processing to be a small fraction of the > > host syscall latency, let alone the latency of the physical host adapter. It > > probably even fits on the same CPU cache line. > > Encoding the scsi CDB is additional work but I would be surprised it it > is mesurable. Just using scsi cdbs would be simple enough, the bigger > issue is emulating a full blown scsi bus because then you need to do all > kinds queueing decisions at target levels etc and drag in a complicated > scsi stack and not just a simple block driver in the guest. And at > least on current linux kernels that does introduce mesurable latency. > > Now it might be possible to get that latency down to a level where we > can ignore it but when doing all this additional work there always will > be additional overhead. > /me puts on SCSI target mode hat The other obvious benefit for allowing passthrough SCSI block devices into KVM guests via virtio-blk means that at some point those SCSI block devices could be coming from a local target mode stack that is representing LVM block devices as SCSI-3 storage, or say FILEIO on top of a btrfs mount also representing SCSI-3 storage along with the typical hardware passthroughs for local (KVM Host) accessable SCSI devices. The important part is that KVM guests using SG_IO passthrough with virtio-blk (assuming that they actually showed up as SCSI devices in the KVM guest) is that guests would be able to take advantage of existing Linux SCSI I/O fencing functionality that is used for H/A, and for cluster filesystems like GFS in RHEL, etc. Also using scsi_dh_alua in Linux KVM guests against SCSI-3 compatible block_devices using the passthrough means you could do some interesting things with controlling bandwith and paths using asymmetric access port states from local storage into Linux KVM guest. How much latency overhead a SCSI passthrough (with a pseudo SCSI bus) in the KVM guest would add compared to native virtio-blk would be the key metric. Now if the KVM SCSI passthrough was talking in SCSI-target mode directly to drivers/scsi (instead of going through Block -> SCSI for example) it would actually involve less overhead on the KVM host IMHO. For the usage of using BLOCK and FILEIO devices that have SCSI-3 emulation on top (and then passing into the KVM guest) would obviously add processing overhead, but for those folks who are interested in SCSI I/O fencing in the KVM guest using existing tools, the performance overhead would be reasonable tradeoff on machines with a fast memory and point-to-point bus architecture. --nab > > > > Paul > ---end quoted text--- > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 30 April 2009, Christoph Hellwig wrote: > On Wed, Apr 29, 2009 at 12:37:20PM +0100, Paul Brook wrote: > > How exactly does it introduce additional latency? A scsi command block is > > hardly large or complicated. Are you suggesting that a 16/32byte scsi > > command takes significantly longer to process than a 16byte virtio > > command descriptor? I'd expect any extra processing to be a small > > fraction of the host syscall latency, let alone the latency of the > > physical host adapter. It probably even fits on the same CPU cache line. > > Encoding the scsi CDB is additional work but I would be surprised it it > is mesurable. Just using scsi cdbs would be simple enough, the bigger > issue is emulating a full blown scsi bus because then you need to do all > kinds queueing decisions at target levels etc and drag in a complicated > scsi stack and not just a simple block driver in the guest. And at > least on current linux kernels that does introduce mesurable latency. Only if you emulate a crufty old parallel scsi bus, and that's just silly. One of the nice things about scsi is it separates the command set from the transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably several others I've forgotten. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2009 at 4:49 PM, Paul Brook <paul@codesourcery.com> wrote: > One of the nice things about scsi is it separates the command set from the > transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably > several others I've forgotten. ATAPI, SCSIoFC, SCSIoIB....
On Thu, Apr 30, 2009 at 10:49:19PM +0100, Paul Brook wrote: > Only if you emulate a crufty old parallel scsi bus, and that's just silly. > One of the nice things about scsi is it separates the command set from the > transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably > several others I've forgotten. It has nothing to do with an SPI bus. Everything that resembles a SAM architecture can have multiple LUs per targer, and multiple targers per initiator port, so we need all the complex queing code, and we need error handling and and and. For some really simple ones like USB mass-storage we might get away with some ad-hoc SCSI CDB generation instead of real scsi stack, but I would recommend against it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Hellwig wrote: > On Thu, Apr 30, 2009 at 10:49:19PM +0100, Paul Brook wrote: > > Only if you emulate a crufty old parallel scsi bus, and that's just silly. > > One of the nice things about scsi is it separates the command set from the > > transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably > > several others I've forgotten. > > It has nothing to do with an SPI bus. Everything that resembles a SAM > architecture can have multiple LUs per targer, and multiple targers per > initiator port, so we need all the complex queing code, and we need > error handling and and and. If you're using virtio-block to connect to lots of LUNs on lots of targets (i.e. lots of block devices), don't you need similar queuing code and error handling for all that too? -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 01, 2009 at 12:08:06PM +0100, Jamie Lokier wrote: > If you're using virtio-block to connect to lots of LUNs on lots of > targets (i.e. lots of block devices), don't you need similar queuing > code and error handling for all that too? Currenly there's a 1:1 relation of virtio drivers and virtio queues. Note that I don't argue against virtio-scsi, I brought this up first. I just thing that virtio-blk is actually nicer for some use cases. (and of course we'll have to keep it for backwards compatiblity anyway) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 1 May 2009 05:43:50 am Christoph Hellwig wrote: > On Wed, Apr 29, 2009 at 12:37:20PM +0100, Paul Brook wrote: > > How exactly does it introduce additional latency? A scsi command block is > > hardly large or complicated. Are you suggesting that a 16/32byte scsi > > command takes significantly longer to process than a 16byte virtio > > command descriptor? I'd expect any extra processing to be a small > > fraction of the host syscall latency, let alone the latency of the > > physical host adapter. It probably even fits on the same CPU cache line. > > Encoding the scsi CDB is additional work but I would be surprised it it > is mesurable. Just using scsi cdbs would be simple enough, the bigger > issue is emulating a full blown scsi bus because then you need to do all > kinds queueing decisions at target levels etc and drag in a complicated > scsi stack and not just a simple block driver in the guest. And at > least on current linux kernels that does introduce mesurable latency. > > Now it might be possible to get that latency down to a level where we > can ignore it but when doing all this additional work there always will > be additional overhead. But Paul, if you're enthusiastic you could patch lguest and show how simple it is :) Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: qemu/hw/virtio-blk.h =================================================================== --- qemu.orig/hw/virtio-blk.h 2009-04-26 16:50:38.154074532 +0200 +++ qemu/hw/virtio-blk.h 2009-04-26 22:51:16.838076869 +0200 @@ -28,6 +28,9 @@ #define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ #define VIRTIO_BLK_F_SEG_MAX 2 /* Indicates maximum # of segments */ #define VIRTIO_BLK_F_GEOMETRY 4 /* Indicates support of legacy geometry */ +#define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ +#define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ struct virtio_blk_config { @@ -70,6 +73,15 @@ struct virtio_blk_inhdr unsigned char status; }; +/* SCSI pass-through header */ +struct virtio_scsi_inhdr +{ + uint32_t errors; + uint32_t data_len; + uint32_t sense_len; + uint32_t residual; +}; + void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs); #endif Index: qemu/hw/virtio-blk.c =================================================================== --- qemu.orig/hw/virtio-blk.c 2009-04-26 16:50:38.160074667 +0200 +++ qemu/hw/virtio-blk.c 2009-04-27 10:25:19.278074514 +0200 @@ -15,6 +15,9 @@ #include <sysemu.h> #include "virtio-blk.h" #include "block_int.h" +#ifdef __linux__ +# include <scsi/sg.h> +#endif typedef struct VirtIOBlock { @@ -35,6 +38,7 @@ typedef struct VirtIOBlockReq VirtQueueElement elem; struct virtio_blk_inhdr *in; struct virtio_blk_outhdr *out; + struct virtio_scsi_inhdr *scsi; QEMUIOVector qiov; struct VirtIOBlockReq *next; } VirtIOBlockReq; @@ -103,6 +107,108 @@ static VirtIOBlockReq *virtio_blk_get_re return req; } +#ifdef __linux__ +static void virtio_blk_handle_scsi(VirtIOBlockReq *req) +{ + struct sg_io_hdr hdr; + int ret, size = 0; + int status; + int i; + + /* + * We require at least one output segment each for the virtio_blk_outhdr + * and the SCSI command block. + * + * We also at least require the virtio_blk_inhdr, the virtio_scsi_inhdr + * and the sense buffer pointer in the input segments. + */ + if (req->elem.out_num < 2 || req->elem.in_num < 3) { + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); + return; + } + + /* + * No support for bidirection commands yet. + */ + if (req->elem.out_num > 2 && req->elem.in_num > 3) { + virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); + return; + } + + /* + * The scsi inhdr is placed in the second-to-last input segment, just + * before the regular inhdr. + */ + req->scsi = (void *)req->elem.in_sg[req->elem.in_num - 2].iov_base; + size = sizeof(*req->in) + sizeof(*req->scsi); + + memset(&hdr, 0, sizeof(struct sg_io_hdr)); + hdr.interface_id = 'S'; + hdr.cmd_len = req->elem.out_sg[1].iov_len; + hdr.cmdp = req->elem.out_sg[1].iov_base; + hdr.dxfer_len = 0; + + if (req->elem.out_num > 2) { + /* + * If there are more than the minimally required 2 output segments + * there is write payload starting from the third iovec. + */ + hdr.dxfer_direction = SG_DXFER_TO_DEV; + hdr.iovec_count = req->elem.out_num - 2; + + for (i = 0; i < hdr.iovec_count; i++) + hdr.dxfer_len += req->elem.out_sg[i + 2].iov_len; + + hdr.dxferp = req->elem.out_sg + 2; + + } else if (req->elem.in_num > 3) { + /* + * If we have more than 3 input segments the guest wants to actually + * read data. + */ + hdr.dxfer_direction = SG_DXFER_FROM_DEV; + hdr.iovec_count = req->elem.in_num - 3; + for (i = 0; i < hdr.iovec_count; i++) + hdr.dxfer_len += req->elem.in_sg[i].iov_len; + + hdr.dxferp = req->elem.in_sg; + size += hdr.dxfer_len; + } else { + /* + * Some SCSI commands don't actually transfer any data. + */ + hdr.dxfer_direction = SG_DXFER_NONE; + } + + hdr.sbp = req->elem.in_sg[req->elem.in_num - 3].iov_base; + hdr.mx_sb_len = req->elem.in_sg[req->elem.in_num - 3].iov_len; + size += hdr.mx_sb_len; + + ret = bdrv_ioctl(req->dev->bs, SG_IO, &hdr); + if (ret) { + status = VIRTIO_BLK_S_UNSUPP; + hdr.status = ret; + hdr.resid = hdr.dxfer_len; + } else if (hdr.status) { + status = VIRTIO_BLK_S_IOERR; + } else { + status = VIRTIO_BLK_S_OK; + } + + req->scsi->errors = hdr.status; + req->scsi->residual = hdr.resid; + req->scsi->sense_len = hdr.sb_len_wr; + req->scsi->data_len = hdr.dxfer_len; + + virtio_blk_req_complete(req, status); +} +#else +static void virtio_blk_handle_scsi(VirtIOBlockReq *req) +{ + virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +} +#endif /* __linux__ */ + static void virtio_blk_handle_write(VirtIOBlockReq *req) { bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov, @@ -136,12 +242,7 @@ static void virtio_blk_handle_output(Vir req->in = (void *)req->elem.in_sg[req->elem.in_num - 1].iov_base; if (req->out->type & VIRTIO_BLK_T_SCSI_CMD) { - unsigned int len = sizeof(*req->in); - - req->in->status = VIRTIO_BLK_S_UNSUPP; - virtqueue_push(vq, &req->elem, len); - virtio_notify(vdev, vq); - qemu_free(req); + virtio_blk_handle_scsi(req); } else if (req->out->type & VIRTIO_BLK_T_OUT) { qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1], req->elem.out_num - 1); @@ -203,7 +304,15 @@ static void virtio_blk_update_config(Vir static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { - return (1 << VIRTIO_BLK_F_SEG_MAX | 1 << VIRTIO_BLK_F_GEOMETRY); + uint32_t features = 0; + + features |= (1 << VIRTIO_BLK_F_SEG_MAX); + features |= (1 << VIRTIO_BLK_F_GEOMETRY); +#ifdef __linux__ + features |= (1 << VIRTIO_BLK_F_SCSI); +#endif + + return features; } static void virtio_blk_save(QEMUFile *f, void *opaque)
[had the qemu list address wrong the first time, reply to this message, not the previous if you were on Cc] Add support for SG_IO passthru (packet commands) to the virtio-blk backend. Conceptually based on an older patch from Hannes Reinecke but largely rewritten to match the code structure and layering in virtio-blk. Note that currently we issue the hose SG_IO synchronously. We could easily switch to async I/O, but that would required either bloating the VirtIOBlockReq by the size of struct sg_io_hdr or an additional memory allocation for each SG_IO request. Signed-off-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html