diff mbox

[9/9] bsg: split handling of SCSI CDBs vs transport requeues

Message ID 20171003104845.10417-10-hch@lst.de (mailing list archive)
State Not Applicable
Headers show

Commit Message

Christoph Hellwig Oct. 3, 2017, 10:48 a.m. UTC
The current BSG design tries to shoe-horn the transport-specific passthrough
commands into the overall framework for SCSI passthrough requests.  This
has a couple problems:

 - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag
   despite not dealing with SCSI commands at all.  Because of that these
   queues could also incorrectly accept SCSI commands from in-kernel
   users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl.
 - the real SCSI bsg queues also incorrectly accept bsg requests of the
   BSG_SUB_PROTOCOL_SCSI_TRANSPORT type
 - the bsg transport code is almost unredable because it tries to reuse
   different SCSI concepts for its own purpose.

This patch instead adds a new bsg_ops structure to handle the two cases
differently, and thus solves all of the above problems.  Another side
effect is that the bsg-lib queues also don't need to embedd a
struct scsi_request anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bsg-lib.c                   | 158 +++++++++++++++--------
 block/bsg.c                       | 257 +++++++++++++++++---------------------
 drivers/scsi/scsi_lib.c           |   4 +-
 drivers/scsi/scsi_sysfs.c         |   3 +-
 drivers/scsi/scsi_transport_sas.c |   1 -
 include/linux/bsg-lib.h           |   4 +-
 include/linux/bsg.h               |  35 ++++--
 7 files changed, 251 insertions(+), 211 deletions(-)

Comments

Hannes Reinecke Oct. 4, 2017, 6:26 a.m. UTC | #1
On 10/03/2017 12:48 PM, Christoph Hellwig wrote:
> The current BSG design tries to shoe-horn the transport-specific passthrough
> commands into the overall framework for SCSI passthrough requests.  This
> has a couple problems:
> 
>  - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag
>    despite not dealing with SCSI commands at all.  Because of that these
>    queues could also incorrectly accept SCSI commands from in-kernel
>    users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl.
>  - the real SCSI bsg queues also incorrectly accept bsg requests of the
>    BSG_SUB_PROTOCOL_SCSI_TRANSPORT type
>  - the bsg transport code is almost unredable because it tries to reuse
>    different SCSI concepts for its own purpose.
> 
> This patch instead adds a new bsg_ops structure to handle the two cases
> differently, and thus solves all of the above problems.  Another side
> effect is that the bsg-lib queues also don't need to embedd a
> struct scsi_request anymore.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bsg-lib.c                   | 158 +++++++++++++++--------
>  block/bsg.c                       | 257 +++++++++++++++++---------------------
>  drivers/scsi/scsi_lib.c           |   4 +-
>  drivers/scsi/scsi_sysfs.c         |   3 +-
>  drivers/scsi/scsi_transport_sas.c |   1 -
>  include/linux/bsg-lib.h           |   4 +-
>  include/linux/bsg.h               |  35 ++++--
>  7 files changed, 251 insertions(+), 211 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Johannes Thumshirn Oct. 4, 2017, 7:18 a.m. UTC | #2
Christoph Hellwig <hch@lst.de> writes:
[...]
> @@ -965,7 +932,8 @@ void bsg_unregister_queue(struct request_queue *q)
>  EXPORT_SYMBOL_GPL(bsg_unregister_queue);
>  
>  int bsg_register_queue(struct request_queue *q, struct device *parent,
> -		       const char *name, void (*release)(struct device *))
> +		const char *name, const struct bsg_ops *ops,
> +		void (*release)(struct device *))

Wouldn't it make sense to put the ->release() method into bsg_ops as
well? The current prototype of bsg_register_queue isn't exactly what I
would call a sane API.
Johannes Thumshirn Oct. 4, 2017, 7:18 a.m. UTC | #3
Christoph Hellwig <hch@lst.de> writes:
[...]
> @@ -965,7 +932,8 @@ void bsg_unregister_queue(struct request_queue *q)
>  EXPORT_SYMBOL_GPL(bsg_unregister_queue);
>  
>  int bsg_register_queue(struct request_queue *q, struct device *parent,
> -		       const char *name, void (*release)(struct device *))
> +		const char *name, const struct bsg_ops *ops,
> +		void (*release)(struct device *))

Wouldn't it make sense to put the ->release() method into bsg_ops as
well? The current prototype of bsg_register_queue isn't exactly what I
would call a sane API.
Christoph Hellwig Oct. 4, 2017, 7:20 a.m. UTC | #4
On Wed, Oct 04, 2017 at 09:18:11AM +0200, Johannes Thumshirn wrote:
> Wouldn't it make sense to put the ->release() method into bsg_ops as
> well? The current prototype of bsg_register_queue isn't exactly what I
> would call a sane API.

It's a different level of callback - ops are the type of request
passed through (scsi vs transport) and ->release is s whacky
implementation detail of the SAS passthrough.  If at all ->release
should go away eventually by cleaning that mess up.
Johannes Thumshirn Oct. 4, 2017, 8:52 a.m. UTC | #5
On Wed, Oct 04, 2017 at 09:20:59AM +0200, Christoph Hellwig wrote:
> It's a different level of callback - ops are the type of request
> passed through (scsi vs transport) and ->release is s whacky
> implementation detail of the SAS passthrough.  If at all ->release
> should go away eventually by cleaning that mess up.

OK then,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Benjamin Block Oct. 19, 2017, 3:59 p.m. UTC | #6
Hey Christoph,

better late than never I guess.

On Tue, Oct 03, 2017 at 12:48:45PM +0200, Christoph Hellwig wrote:
> The current BSG design tries to shoe-horn the transport-specific passthrough
> commands into the overall framework for SCSI passthrough requests.  This
> has a couple problems:
>
>  - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag
>    despite not dealing with SCSI commands at all.  Because of that these
>    queues could also incorrectly accept SCSI commands from in-kernel
>    users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl.
>  - the real SCSI bsg queues also incorrectly accept bsg requests of the
>    BSG_SUB_PROTOCOL_SCSI_TRANSPORT type
>  - the bsg transport code is almost unredable because it tries to reuse
>    different SCSI concepts for its own purpose.
>
> This patch instead adds a new bsg_ops structure to handle the two cases
> differently, and thus solves all of the above problems.  Another side
> effect is that the bsg-lib queues also don't need to embedd a
> struct scsi_request anymore.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bsg-lib.c                   | 158 +++++++++++++++--------
>  block/bsg.c                       | 257 +++++++++++++++++---------------------
>  drivers/scsi/scsi_lib.c           |   4 +-
>  drivers/scsi/scsi_sysfs.c         |   3 +-
>  drivers/scsi/scsi_transport_sas.c |   1 -
>  include/linux/bsg-lib.h           |   4 +-
>  include/linux/bsg.h               |  35 ++++--
>  7 files changed, 251 insertions(+), 211 deletions(-)
>
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 6299526bd2c3..99b459e21782 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -27,6 +27,94 @@
>  #include <linux/bsg-lib.h>
>  #include <linux/export.h>
>  #include <scsi/scsi_cmnd.h>
> +#include <scsi/sg.h>
> +
> +#define ptr64(val) ((void __user *)(uintptr_t)(val))

Better to reflect the special property, that it is a user pointer, in
the name of the macro. Maybe something like user_ptr(64). The same
comment for the same macro in bsg.c.

> +
> +static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
> +{
> +	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> +	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
> +		return -EINVAL;
> +	if (!capable(CAP_SYS_RAWIO))
> +		return -EPERM;

Any particular reason why this is not symmetric with bsg_scsi? IOW
permission checking done in bsg_transport_fill_hdr(), like it is done in
bsg_scsi_fill_hdr()?

We might save some time copying memory with this (we also only talk
about ~20 bytes here), but on the other hand the interface would be more
clean otherwise IMO (if we already do restructure the interface) -
similar callbacks have similar responsibilities.

> +	return 0;
> +}
> +
> +static int bsg_transport_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
> +		fmode_t mode)
> +{
> +	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> +
> +	job->request_len = hdr->request_len;
> +	job->request = memdup_user(ptr64(hdr->request), hdr->request_len);
> +	if (IS_ERR(job->request))
> +		return PTR_ERR(job->request);
> +	return 0;
> +}
> +
> +static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
> +{
> +	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> +	int ret = 0;
> +
> +	/*
> +	 * The assignments below don't make much sense, but are kept for
> +	 * bug by bug backwards compatibility:
> +	 */
> +	hdr->device_status = job->result & 0xff;
> +	hdr->transport_status = host_byte(job->result);
> +	hdr->driver_status = driver_byte(job->result);
> +	hdr->info = 0;
> +	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> +		hdr->info |= SG_INFO_CHECK;
> +	hdr->response_len = 0;
> +
> +	if (job->result < 0) {
> +		/* we're only returning the result field in the reply */
> +		job->reply_len = sizeof(u32);
> +		ret = job->result;
> +	}
> +
> +	if (job->reply_len && hdr->response) {
> +		int len = min(hdr->max_response_len, job->reply_len);
> +
> +		if (copy_to_user(ptr64(hdr->response), job->reply, len))
> +			ret = -EFAULT;
> +		else
> +			hdr->response_len = len;

very very minor nitpick: this is reversed with the handling in
bsg_scsi_complete_rq().. could be identical.

> +	}
> +
> +	/* we assume all request payload was transferred, residual == 0 */
> +	hdr->dout_resid = 0;
> +
> +	if (rq->next_rq) {
> +		unsigned int rsp_len = blk_rq_bytes(rq->next_rq);
> +
> +		if (WARN_ON(job->reply_payload_rcv_len > rsp_len))
> +			hdr->din_resid = 0;

If I understand this right, the this reflects the old code, if only
written down a little different.

But I wonder why we do that? Wouldn't that be interesting to know for
uspace, if more was received than it allocated space for? Isn't that the
typical residual over run case (similar to LUN scanning in SCSI common
code), and din_resid is signed after all? Well I guess it could be an
ABI break, I don't know.

Ah well, at least the documentation for 'struct sg_io_v4' makes no such
restrictions (that it can not be below 0).

Just a thought I had while reading it.

> +		else
> +			hdr->din_resid = rsp_len - job->reply_payload_rcv_len;
> +	} else {
> +		hdr->din_resid = 0;
> +	}
> +
> +	return ret;
> +}
> +
> +static void bsg_transport_free_rq(struct request *rq)
> +{
> +	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> +
> +	kfree(job->request);
> +}
> +
> +static const struct bsg_ops bsg_transport_ops = {
> +	.check_proto		= bsg_transport_check_proto,
> +	.fill_hdr		= bsg_transport_fill_hdr,
> +	.complete_rq		= bsg_transport_complete_rq,
> +	.free_rq		= bsg_transport_free_rq,
> +};
>
>  /**
>   * bsg_teardown_job - routine to teardown a bsg job
> @@ -68,27 +156,9 @@ EXPORT_SYMBOL_GPL(bsg_job_get);
>  void bsg_job_done(struct bsg_job *job, int result,
>  		  unsigned int reply_payload_rcv_len)
>  {
> -	struct request *req = blk_mq_rq_from_pdu(job);
> -	struct request *rsp = req->next_rq;
> -	int err;
> -
> -	err = job->sreq.result = result;
> -	if (err < 0)
> -		/* we're only returning the result field in the reply */
> -		job->sreq.sense_len = sizeof(u32);
> -	else
> -		job->sreq.sense_len = job->reply_len;
> -	/* we assume all request payload was transferred, residual == 0 */
> -	job->sreq.resid_len = 0;
> -
> -	if (rsp) {
> -		WARN_ON(reply_payload_rcv_len > scsi_req(rsp)->resid_len);
> -
> -		/* set reply (bidi) residual */
> -		scsi_req(rsp)->resid_len -=
> -			min(reply_payload_rcv_len, scsi_req(rsp)->resid_len);
> -	}
> -	blk_complete_request(req);
> +	job->result = result;
> +	job->reply_payload_rcv_len = reply_payload_rcv_len;
> +	blk_complete_request(blk_mq_rq_from_pdu(job));
>  }
>  EXPORT_SYMBOL_GPL(bsg_job_done);
>
> @@ -113,7 +183,6 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
>  	if (!buf->sg_list)
>  		return -ENOMEM;
>  	sg_init_table(buf->sg_list, req->nr_phys_segments);
> -	scsi_req(req)->resid_len = blk_rq_bytes(req);
>  	buf->sg_cnt = blk_rq_map_sg(req->q, req, buf->sg_list);
>  	buf->payload_len = blk_rq_bytes(req);
>  	return 0;
> @@ -124,16 +193,13 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
>   * @dev: device that is being sent the bsg request
>   * @req: BSG request that needs a job structure
>   */
> -static int bsg_prepare_job(struct device *dev, struct request *req)
> +static bool bsg_prepare_job(struct device *dev, struct request *req)
>  {
>  	struct request *rsp = req->next_rq;
> -	struct scsi_request *rq = scsi_req(req);
>  	struct bsg_job *job = blk_mq_rq_to_pdu(req);
>  	int ret;
>
>  	job->timeout = req->timeout;
> -	job->request = rq->cmd;
> -	job->request_len = rq->cmd_len;
>
>  	if (req->bio) {
>  		ret = bsg_map_buffer(&job->request_payload, req);
> @@ -149,12 +215,13 @@ static int bsg_prepare_job(struct device *dev, struct request *req)
>  	/* take a reference for the request */
>  	get_device(job->dev);
>  	kref_init(&job->kref);
> -	return 0;
> +	return true;
>
>  failjob_rls_rqst_payload:
>  	kfree(job->request_payload.sg_list);
>  failjob_rls_job:
> -	return -ENOMEM;
> +	job->result = -ENOMEM;
> +	return false;
>  }
>
>  /**
> @@ -183,9 +250,7 @@ static void bsg_request_fn(struct request_queue *q)
>  			break;
>  		spin_unlock_irq(q->queue_lock);
>
> -		ret = bsg_prepare_job(dev, req);
> -		if (ret) {
> -			scsi_req(req)->result = ret;
> +		if (!bsg_prepare_job(dev, req)) {
>  			blk_end_request_all(req, BLK_STS_OK);
>  			spin_lock_irq(q->queue_lock);
>  			continue;
> @@ -202,46 +267,34 @@ static void bsg_request_fn(struct request_queue *q)
>  	spin_lock_irq(q->queue_lock);
>  }
>
> +/* called right after the request is allocated for the request_queue */
>  static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
>  {
>  	struct bsg_job *job = blk_mq_rq_to_pdu(req);
> -	struct scsi_request *sreq = &job->sreq;
> -
> -	/* called right after the request is allocated for the request_queue */
>
> -	sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
> -	if (!sreq->sense)
> +	job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);

One suggestion here. Maybe we could get rid of this implicit knowledge
about SCSI_SENSE_BUFFERSIZE being the max size for a bsg-reply?
Especially if we use this patch and get rid of other similarities (like
using scsi_request).

Maybe we can just define a extra macro in bsg-lib.c, or in one of the
headers, and define its size to be SCSI_SENSE_BUFFERSIZE (for now) and
then use that in all cases.

I tried something similar some time ego if you remember, but I couldn't
follow up because other stuff got more important in the meantime. One
could also static check the transport reply-types against that.

This way, should need to change that value for a sepcific transport, we
only need to change one knob, and not 10 (I guess SCSI_SENSE_BUFFERSIZE
could not be changed for such cases ;) ).

> +	if (!job->reply)
>  		return -ENOMEM;
> -
>  	return 0;
>  }
>
> +/* called right before the request is given to the request_queue user */
>  static void bsg_initialize_rq(struct request *req)
>  {
>  	struct bsg_job *job = blk_mq_rq_to_pdu(req);
> -	struct scsi_request *sreq = &job->sreq;
> -	void *sense = sreq->sense;
> -
> -	/* called right before the request is given to the request_queue user */
> +	void *reply = job->reply;
>
>  	memset(job, 0, sizeof(*job));
> -
> -	scsi_req_init(sreq);
> -
> -	sreq->sense = sense;
> -	sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
> -
> -	job->reply = sense;
> -	job->reply_len = sreq->sense_len;
> +	job->reply = reply;
> +	job->reply_len = SCSI_SENSE_BUFFERSIZE;
>  	job->dd_data = job + 1;
>  }
>
>  static void bsg_exit_rq(struct request_queue *q, struct request *req)
>  {
>  	struct bsg_job *job = blk_mq_rq_to_pdu(req);
> -	struct scsi_request *sreq = &job->sreq;
>
> -	kfree(sreq->sense);
> +	kfree(job->reply);
>  }
>
>  /**
> @@ -274,11 +327,10 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
>  	q->queuedata = dev;
>  	q->bsg_job_fn = job_fn;
>  	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
> -	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>  	blk_queue_softirq_done(q, bsg_softirq_done);
>  	blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
>
> -	ret = bsg_register_queue(q, dev, name, release);
> +	ret = bsg_register_queue(q, dev, name, &bsg_transport_ops, release);
>  	if (ret) {
>  		printk(KERN_ERR "%s: bsg interface failed to "
>  		       "initialize - register queue\n", dev->kobj.name);
> diff --git a/block/bsg.c b/block/bsg.c
> index 452f94f1c5d4..04407cedff09 100644
> --- a/block/bsg.c
> +++ b/block/bsg.c
> @@ -135,113 +135,124 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
>  	return &bsg_device_list[index & (BSG_LIST_ARRAY_SIZE - 1)];
>  }
>
> -static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
> -				struct sg_io_v4 *hdr, struct bsg_device *bd,
> -				fmode_t mode)
> +#define ptr64(val) ((void __user *)(uintptr_t)(val))

Same as comment as for the same macro in bsg-lib.c.

> +
> +static int bsg_scsi_check_proto(struct sg_io_v4 *hdr)
> +{
> +	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> +	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_CMD)
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int bsg_scsi_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
> +		fmode_t mode)
>  {
> -	struct scsi_request *req = scsi_req(rq);
> +	struct scsi_request *sreq = scsi_req(rq);
>
> -	if (hdr->request_len > BLK_MAX_CDB) {
> -		req->cmd = kzalloc(hdr->request_len, GFP_KERNEL);
> -		if (!req->cmd)
> +	sreq->cmd_len = hdr->request_len;
> +	if (sreq->cmd_len > BLK_MAX_CDB) {
> +		sreq->cmd = kzalloc(sreq->cmd_len, GFP_KERNEL);
> +		if (!sreq->cmd)
>  			return -ENOMEM;
>  	}
>
> -	if (copy_from_user(req->cmd, (void __user *)(unsigned long)hdr->request,
> -			   hdr->request_len))
> +	if (copy_from_user(sreq->cmd, ptr64(hdr->request), sreq->cmd_len))
>  		return -EFAULT;
> -
> -	if (hdr->subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
> -		if (blk_verify_command(req->cmd, mode))
> -			return -EPERM;
> -	} else if (!capable(CAP_SYS_RAWIO))
> +	if (blk_verify_command(sreq->cmd, mode))
>  		return -EPERM;
> -
> -	/*
> -	 * fill in request structure
> -	 */
> -	req->cmd_len = hdr->request_len;
> -
> -	rq->timeout = msecs_to_jiffies(hdr->timeout);
> -	if (!rq->timeout)
> -		rq->timeout = q->sg_timeout;
> -	if (!rq->timeout)
> -		rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
> -	if (rq->timeout < BLK_MIN_SG_TIMEOUT)
> -		rq->timeout = BLK_MIN_SG_TIMEOUT;
> -
>  	return 0;
>  }
>
> -/*
> - * Check if sg_io_v4 from user is allowed and valid
> - */
> -static int
> -bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
> +static int bsg_scsi_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
>  {
> +	struct scsi_request *sreq = scsi_req(rq);
>  	int ret = 0;
>
> -	if (hdr->guard != 'Q')
> -		return -EINVAL;
> +	/*
> +	 * fill in all the output members
> +	 */
> +	hdr->device_status = sreq->result & 0xff;
> +	hdr->transport_status = host_byte(sreq->result);
> +	hdr->driver_status = driver_byte(sreq->result);
> +	hdr->info = 0;
> +	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> +		hdr->info |= SG_INFO_CHECK;
> +	hdr->response_len = 0;
>
> -	switch (hdr->protocol) {
> -	case BSG_PROTOCOL_SCSI:
> -		switch (hdr->subprotocol) {
> -		case BSG_SUB_PROTOCOL_SCSI_CMD:
> -		case BSG_SUB_PROTOCOL_SCSI_TRANSPORT:
> -			break;
> -		default:
> -			ret = -EINVAL;
> -		}
> -		break;
> -	default:
> -		ret = -EINVAL;
> +	if (sreq->sense_len && hdr->response) {
> +		int len = min_t(unsigned int, hdr->max_response_len,
> +					sreq->sense_len);
> +
> +		if (!copy_to_user(ptr64(hdr->response), sreq->sense, len))
> +			hdr->response_len = len;
> +		else
> +			ret = -EFAULT;
> +	}
> +
> +	if (rq->next_rq) {
> +		hdr->dout_resid = sreq->resid_len;
> +		hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
> +	} else if (rq_data_dir(rq) == READ) {
> +		hdr->din_resid = sreq->resid_len;
> +	} else {
> +		hdr->dout_resid = sreq->resid_len;
>  	}
>
> -	*op = hdr->dout_xfer_len ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN;
>  	return ret;
>  }
>
> -/*
> - * map sg_io_v4 to a request.
> - */
> +static void bsg_scsi_free_rq(struct request *rq)
> +{
> +	scsi_req_free_cmd(scsi_req(rq));
> +}
> +
> +static const struct bsg_ops bsg_scsi_ops = {
> +	.check_proto		= bsg_scsi_check_proto,
> +	.fill_hdr		= bsg_scsi_fill_hdr,
> +	.complete_rq		= bsg_scsi_complete_rq,
> +	.free_rq		= bsg_scsi_free_rq,
> +};
> +
>  static struct request *
> -bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t mode)
> +bsg_map_hdr(struct request_queue *q, struct sg_io_v4 *hdr, fmode_t mode)
>  {
> -	struct request_queue *q = bd->queue;
>  	struct request *rq, *next_rq = NULL;
>  	int ret;
> -	unsigned int op, dxfer_len;
> -	void __user *dxferp = NULL;
> -	struct bsg_class_device *bcd = &q->bsg_dev;
>
> -	/* if the LLD has been removed then the bsg_unregister_queue will
> -	 * eventually be called and the class_dev was freed, so we can no
> -	 * longer use this request_queue. Return no such address.
> -	 */

Why remove the comment? Has that changed?

> -	if (!bcd->class_dev)
> +	if (!q->bsg_dev.class_dev)
>  		return ERR_PTR(-ENXIO);
>
>  	dprintk("map hdr %llx/%u %llx/%u\n", (unsigned long long) hdr->dout_xferp,
>  		hdr->dout_xfer_len, (unsigned long long) hdr->din_xferp,
>  		hdr->din_xfer_len);
>
> -	ret = bsg_validate_sgv4_hdr(hdr, &op);
> +	if (hdr->guard != 'Q')
> +		return ERR_PTR(-EINVAL);
> +
> +	ret = q->bsg_dev.ops->check_proto(hdr);
>  	if (ret)
>  		return ERR_PTR(ret);
>
> -	/*
> -	 * map scatter-gather elements separately and string them to request
> -	 */
> -	rq = blk_get_request(q, op, GFP_KERNEL);
> +	rq = blk_get_request(q, hdr->dout_xfer_len ?
> +			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN,
> +			GFP_KERNEL);
>  	if (IS_ERR(rq))
>  		return rq;
>
> -	ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, mode);
> +	ret = q->bsg_dev.ops->fill_hdr(rq, hdr, mode);
>  	if (ret)
>  		goto out;
>
> -	if (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len) {
> +	rq->timeout = msecs_to_jiffies(hdr->timeout);
> +	if (!rq->timeout)
> +		rq->timeout = rq->q->sg_timeout;

No need to use the rq pointer, you already have a variable q with the
same content.

> +	if (!rq->timeout)
> +		rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
> +	if (rq->timeout < BLK_MIN_SG_TIMEOUT)
> +		rq->timeout = BLK_MIN_SG_TIMEOUT;
> +
> +	if (hdr->dout_xfer_len && hdr->din_xfer_len) {
>  		if (!test_bit(QUEUE_FLAG_BIDI, &q->queue_flags)) {
>  			ret = -EOPNOTSUPP;
>  			goto out;
> @@ -250,42 +261,39 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t mode)
>  		next_rq = blk_get_request(q, REQ_OP_SCSI_IN, GFP_KERNEL);
>  		if (IS_ERR(next_rq)) {
>  			ret = PTR_ERR(next_rq);
> -			next_rq = NULL;
>  			goto out;
>  		}
> -		rq->next_rq = next_rq;
>
> -		dxferp = (void __user *)(unsigned long)hdr->din_xferp;
> -		ret =  blk_rq_map_user(q, next_rq, NULL, dxferp,
> +		rq->next_rq = next_rq;
> +		ret = blk_rq_map_user(q, next_rq, NULL, ptr64(hdr->din_xferp),
>  				       hdr->din_xfer_len, GFP_KERNEL);
>  		if (ret)
> -			goto out;
> +			goto out_free_nextrq;
>  	}
>
>  	if (hdr->dout_xfer_len) {
> -		dxfer_len = hdr->dout_xfer_len;
> -		dxferp = (void __user *)(unsigned long)hdr->dout_xferp;
> +		ret = blk_rq_map_user(q, rq, NULL, ptr64(hdr->dout_xferp),
> +				hdr->dout_xfer_len, GFP_KERNEL);
>  	} else if (hdr->din_xfer_len) {
> -		dxfer_len = hdr->din_xfer_len;
> -		dxferp = (void __user *)(unsigned long)hdr->din_xferp;
> -	} else
> -		dxfer_len = 0;
> -
> -	if (dxfer_len) {
> -		ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
> -				      GFP_KERNEL);
> -		if (ret)
> -			goto out;
> +		ret = blk_rq_map_user(q, rq, NULL, ptr64(hdr->din_xferp),
> +				hdr->din_xfer_len, GFP_KERNEL);
> +	} else {
> +		ret = blk_rq_map_user(q, rq, NULL, NULL, 0, GFP_KERNEL);

Why do we behave differently in this case now? To prevent special
handling elsewhere? Otherwise it seems a bit pointless/error-prone
mapping zero length to nothing.

>  	}
>
> +	if (ret)
> +		goto out_unmap_nextrq;
>  	return rq;
> +
> +out_unmap_nextrq:
> +	if (rq->next_rq)
> +		blk_rq_unmap_user(rq->next_rq->bio);
> +out_free_nextrq:
> +	if (rq->next_rq)
> +		blk_put_request(rq->next_rq);
>  out:
> -	scsi_req_free_cmd(scsi_req(rq));
> +	q->bsg_dev.ops->free_rq(rq);
>  	blk_put_request(rq);
> -	if (next_rq) {
> -		blk_rq_unmap_user(next_rq->bio);
> -		blk_put_request(next_rq);
> -	}
>  	return ERR_PTR(ret);
>  }
>
> @@ -387,56 +395,20 @@ static struct bsg_command *bsg_get_done_cmd(struct bsg_device *bd)
>  static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
>  				    struct bio *bio, struct bio *bidi_bio)
>  {
> -	struct scsi_request *req = scsi_req(rq);
> -	int ret = 0;
> +	int ret;
>
>  	dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
> -	/*
> -	 * fill in all the output members
> -	 */
> -	hdr->device_status = req->result & 0xff;
> -	hdr->transport_status = host_byte(req->result);
> -	hdr->driver_status = driver_byte(req->result);
> -	hdr->info = 0;
> -	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> -		hdr->info |= SG_INFO_CHECK;
> -	hdr->response_len = 0;
>
> -	if (req->sense_len && hdr->response) {
> -		int len = min_t(unsigned int, hdr->max_response_len,
> -					req->sense_len);
> -
> -		ret = copy_to_user((void __user *)(unsigned long)hdr->response,
> -				   req->sense, len);
> -		if (!ret)
> -			hdr->response_len = len;
> -		else
> -			ret = -EFAULT;
> -	}
> +	ret = rq->q->bsg_dev.ops->complete_rq(rq, hdr);
>
>  	if (rq->next_rq) {
> -		hdr->dout_resid = req->resid_len;
> -		hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
>  		blk_rq_unmap_user(bidi_bio);
>  		blk_put_request(rq->next_rq);
> -	} else if (rq_data_dir(rq) == READ)
> -		hdr->din_resid = req->resid_len;
> -	else
> -		hdr->dout_resid = req->resid_len;
> -
> -	/*
> -	 * If the request generated a negative error number, return it
> -	 * (providing we aren't already returning an error); if it's
> -	 * just a protocol response (i.e. non negative), that gets
> -	 * processed above.
> -	 */
> -	if (!ret && req->result < 0)
> -		ret = req->result;
> +	}
>
>  	blk_rq_unmap_user(bio);
> -	scsi_req_free_cmd(req);
> +	rq->q->bsg_dev.ops->free_rq(rq);
>  	blk_put_request(rq);
> -
>  	return ret;
>  }
>
> @@ -618,7 +590,7 @@ static int __bsg_write(struct bsg_device *bd, const char __user *buf,
>  		/*
>  		 * get a request, fill in the blanks, and add to request queue
>  		 */
> -		rq = bsg_map_hdr(bd, &bc->hdr, mode);
> +		rq = bsg_map_hdr(bd->queue, &bc->hdr, mode);
>  		if (IS_ERR(rq)) {
>  			ret = PTR_ERR(rq);
>  			rq = NULL;
> @@ -748,11 +720,6 @@ static struct bsg_device *bsg_add_device(struct inode *inode,
>  	unsigned char buf[32];
>  #endif
>
> -	if (!blk_queue_scsi_passthrough(rq)) {
> -		WARN_ONCE(true, "Attempt to register a non-SCSI queue\n");
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	if (!blk_get_queue(rq))
>  		return ERR_PTR(-ENXIO);
>
> @@ -913,7 +880,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>  		if (copy_from_user(&hdr, uarg, sizeof(hdr)))
>  			return -EFAULT;
>
> -		rq = bsg_map_hdr(bd, &hdr, file->f_mode);
> +		rq = bsg_map_hdr(bd->queue, &hdr, file->f_mode);
>  		if (IS_ERR(rq))
>  			return PTR_ERR(rq);
>
> @@ -965,7 +932,8 @@ void bsg_unregister_queue(struct request_queue *q)
>  EXPORT_SYMBOL_GPL(bsg_unregister_queue);
>
>  int bsg_register_queue(struct request_queue *q, struct device *parent,
> -		       const char *name, void (*release)(struct device *))
> +		const char *name, const struct bsg_ops *ops,
> +		void (*release)(struct device *))
>  {
>  	struct bsg_class_device *bcd;
>  	dev_t dev;
> @@ -1002,6 +970,7 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
>  	bcd->queue = q;
>  	bcd->parent = get_device(parent);
>  	bcd->release = release;
> +	bcd->ops = ops;
>  	kref_init(&bcd->ref);
>  	dev = MKDEV(bsg_major, bcd->minor);
>  	class_dev = device_create(bsg_class, parent, dev, NULL, "%s", devname);
> @@ -1029,7 +998,17 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
>  	mutex_unlock(&bsg_mutex);
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(bsg_register_queue);
> +
> +int bsg_scsi_register_queue(struct request_queue *q, struct device *parent)
> +{
> +	if (!blk_queue_scsi_passthrough(q)) {
> +		WARN_ONCE(true, "Attempt to register a non-SCSI queue\n");
> +		return -EINVAL;
> +	}
> +
> +	return bsg_register_queue(q, parent, NULL, &bsg_scsi_ops, NULL);
> +}
> +EXPORT_SYMBOL_GPL(bsg_scsi_register_queue);
>
>  static struct cdev bsg_cdev;
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 9cf6a80fe297..8a404ff29f9c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2105,8 +2105,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>  {
>  	struct device *dev = shost->dma_dev;
>
> -	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
> -
>  	/*
>  	 * this limit is imposed by hardware restrictions
>  	 */
> @@ -2202,6 +2200,7 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
>  	}
>
>  	__scsi_init_queue(shost, q);
> +	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>  	blk_queue_prep_rq(q, scsi_prep_fn);
>  	blk_queue_unprep_rq(q, scsi_unprep_fn);
>  	blk_queue_softirq_done(q, scsi_softirq_done);
> @@ -2231,6 +2230,7 @@ struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
>
>  	sdev->request_queue->queuedata = sdev;
>  	__scsi_init_queue(sdev->host, sdev->request_queue);
> +	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
>  	return sdev->request_queue;
>  }
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bf53356f41f0..bfb4e6875643 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1262,8 +1262,7 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>  	transport_add_device(&sdev->sdev_gendev);
>  	sdev->is_visible = 1;
>
> -	error = bsg_register_queue(rq, &sdev->sdev_gendev, NULL, NULL);
> -
> +	error = bsg_scsi_register_queue(rq, &sdev->sdev_gendev);
>  	if (error)
>  		/* we're treating error on bsg register as non-fatal,
>  		 * so pretend nothing went wrong */
> diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> index 736a1f4f9676..4889bd432382 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -228,7 +228,6 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
>  	 */
>  	blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
>  	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
> -	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>  	return 0;
>  }
>
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index 08762d297cbd..28a7ccc55c89 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -38,7 +38,6 @@ struct bsg_buffer {
>  };
>
>  struct bsg_job {
> -	struct scsi_request sreq;
>  	struct device *dev;
>
>  	struct kref kref;
> @@ -64,6 +63,9 @@ struct bsg_job {
>  	struct bsg_buffer request_payload;
>  	struct bsg_buffer reply_payload;
>
> +	int result;
> +	unsigned int reply_payload_rcv_len;
> +
>  	void *dd_data;		/* Used for driver-specific storage */
>  };
>
> diff --git a/include/linux/bsg.h b/include/linux/bsg.h
> index 7173f6e9d2dd..ed98946a8324 100644
> --- a/include/linux/bsg.h
> +++ b/include/linux/bsg.h
> @@ -1,33 +1,42 @@
> -#ifndef BSG_H
> -#define BSG_H
> +#ifndef _LINUX_BSG_H
> +#define _LINUX_BSG_H
>
>  #include <uapi/linux/bsg.h>
>
> +struct request;
> +
> +#ifdef CONFIG_BLK_DEV_BSG
> +struct bsg_ops {
> +	int	(*check_proto)(struct sg_io_v4 *hdr);
> +	int	(*fill_hdr)(struct request *rq, struct sg_io_v4 *hdr,
> +				fmode_t mode);
> +	int	(*complete_rq)(struct request *rq, struct sg_io_v4 *hdr);
> +	void	(*free_rq)(struct request *rq);
> +};
>
> -#if defined(CONFIG_BLK_DEV_BSG)
>  struct bsg_class_device {
>  	struct device *class_dev;
>  	struct device *parent;
>  	int minor;
>  	struct request_queue *queue;
>  	struct kref ref;
> +	const struct bsg_ops *ops;
>  	void (*release)(struct device *);
>  };
>
> -extern int bsg_register_queue(struct request_queue *q,
> -			      struct device *parent, const char *name,
> -			      void (*release)(struct device *));
> -extern void bsg_unregister_queue(struct request_queue *);
> +int bsg_register_queue(struct request_queue *q, struct device *parent,
> +		const char *name, const struct bsg_ops *ops,
> +		void (*release)(struct device *));
> +int bsg_scsi_register_queue(struct request_queue *q, struct device *parent);
> +void bsg_unregister_queue(struct request_queue *);
>  #else
> -static inline int bsg_register_queue(struct request_queue *q,
> -				     struct device *parent, const char *name,
> -				     void (*release)(struct device *))
> +static inline int bsg_scsi_register_queue(struct request_queue *q,
> +		struct device *parent)
>  {
>  	return 0;
>  }
>  static inline void bsg_unregister_queue(struct request_queue *q)
>  {
>  }
> -#endif
> -
> -#endif
> +#endif /* CONFIG_BLK_DEV_BSG */
> +#endif /* _LINUX_BSG_H */
> --
> 2.14.1
>

Otherwise I haven't seen anything. I'll run it through my tests on zFCP
and look whether I see something strange happening.

I like the idea of splitting things up here, it gets rid of some
code-duplications and unnecessary double book-keeping. Although I have
to say, looking at functions like bsg_transport_complete_rq() and
bsg_scsi_complete_rq() it also creates some new duplications that
haven't been there before.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Christoph Hellwig Oct. 20, 2017, 4:26 p.m. UTC | #7
On Thu, Oct 19, 2017 at 05:59:33PM +0200, Benjamin Block wrote:
> > +#define ptr64(val) ((void __user *)(uintptr_t)(val))
> 
> Better to reflect the special property, that it is a user pointer, in
> the name of the macro. Maybe something like user_ptr(64). The same
> comment for the same macro in bsg.c.

Not sure it's worth it especially now that Martin has merged the patch.
But given how many interface we have all over the kernel that use a u64
to store a user pointer in ioctls and similar it might make sense to
lift a helper like this to a generic header.  In that case we'll need
a more descriptive name for sure.

> > +static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
> > +{
> > +	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> > +	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
> > +		return -EINVAL;
> > +	if (!capable(CAP_SYS_RAWIO))
> > +		return -EPERM;
> 
> Any particular reason why this is not symmetric with bsg_scsi? IOW
> permission checking done in bsg_transport_fill_hdr(), like it is done in
> bsg_scsi_fill_hdr()?
> 
> We might save some time copying memory with this (we also only talk
> about ~20 bytes here), but on the other hand the interface would be more
> clean otherwise IMO (if we already do restructure the interface) -
> similar callbacks have similar responsibilities.

I could move the capable check around, no sure why I had done it that
way, it's been a while.  Probably because blk_verify_command needs the
CDB while a simple capable() check does not.

> If I understand this right, the this reflects the old code, if only
> written down a little different.
> 
> But I wonder why we do that? Wouldn't that be interesting to know for
> uspace, if more was received than it allocated space for? Isn't that the
> typical residual over run case (similar to LUN scanning in SCSI common
> code), and din_resid is signed after all? Well I guess it could be an
> ABI break, I don't know.
> 
> Ah well, at least the documentation for 'struct sg_io_v4' makes no such
> restrictions (that it can not be below 0).
> 
> Just a thought I had while reading it.

Maybe it would, but I really didn't want to change behavior.  If we
were to redo transport passthrough I would do it totally different today.

> > +	job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
> 
> One suggestion here. Maybe we could get rid of this implicit knowledge
> about SCSI_SENSE_BUFFERSIZE being the max size for a bsg-reply?
> Especially if we use this patch and get rid of other similarities (like
> using scsi_request).
> 
> Maybe we can just define a extra macro in bsg-lib.c, or in one of the
> headers, and define its size to be SCSI_SENSE_BUFFERSIZE (for now) and
> then use that in all cases.
> 
> I tried something similar some time ego if you remember, but I couldn't
> follow up because other stuff got more important in the meantime. One
> could also static check the transport reply-types against that.
> 
> This way, should need to change that value for a sepcific transport, we
> only need to change one knob, and not 10 (I guess SCSI_SENSE_BUFFERSIZE
> could not be changed for such cases ;) ).

There shouldn't be any dependencies on SCSI_SENSE_BUFFERSIZE left,
so yes, this could be cleaned up.  Great opportunity for a follow on
patch.

> > -	/* if the LLD has been removed then the bsg_unregister_queue will
> > -	 * eventually be called and the class_dev was freed, so we can no
> > -	 * longer use this request_queue. Return no such address.
> > -	 */
> 
> Why remove the comment? Has that changed?

Nothing, but then again it's standard behavior so the comment doesn't
really add any value.

> > +	rq->timeout = msecs_to_jiffies(hdr->timeout);
> > +	if (!rq->timeout)
> > +		rq->timeout = rq->q->sg_timeout;
> 
> No need to use the rq pointer, you already have a variable q with the
> same content.

True.

> > -		ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
> > -				      GFP_KERNEL);
> > -		if (ret)
> > -			goto out;
> > +		ret = blk_rq_map_user(q, rq, NULL, ptr64(hdr->din_xferp),
> > +				hdr->din_xfer_len, GFP_KERNEL);
> > +	} else {
> > +		ret = blk_rq_map_user(q, rq, NULL, NULL, 0, GFP_KERNEL);
> 
> Why do we behave differently in this case now? To prevent special
> handling elsewhere? Otherwise it seems a bit pointless/error-prone
> mapping zero length to nothing.

Yes, this could be removed again.  I'll send a follow up.
Benjamin Block Oct. 20, 2017, 4:47 p.m. UTC | #8
On Fri, Oct 20, 2017 at 06:26:30PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 19, 2017 at 05:59:33PM +0200, Benjamin Block wrote:
> > > +#define ptr64(val) ((void __user *)(uintptr_t)(val))
> > 
> > Better to reflect the special property, that it is a user pointer, in
> > the name of the macro. Maybe something like user_ptr(64). The same
> > comment for the same macro in bsg.c.
> 
> Not sure it's worth it especially now that Martin has merged the patch.

He did? I only saw a mail that he picked patches 2-5. So all the bsg
changes are still open I think.

(Maybe I just missed that, I haven't exactly followed the list very
closely as of late)

> But given how many interface we have all over the kernel that use a u64
> to store a user pointer in ioctls and similar it might make sense to
> lift a helper like this to a generic header.  In that case we'll need
> a more descriptive name for sure.
> 
> > > +static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
> > > +{
> > > +	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> > > +	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
> > > +		return -EINVAL;
> > > +	if (!capable(CAP_SYS_RAWIO))
> > > +		return -EPERM;
> > 
> > Any particular reason why this is not symmetric with bsg_scsi? IOW
> > permission checking done in bsg_transport_fill_hdr(), like it is done in
> > bsg_scsi_fill_hdr()?
> > 
> > We might save some time copying memory with this (we also only talk
> > about ~20 bytes here), but on the other hand the interface would be more
> > clean otherwise IMO (if we already do restructure the interface) -
> > similar callbacks have similar responsibilities.
> 
> I could move the capable check around, no sure why I had done it that
> way, it's been a while.  Probably because blk_verify_command needs the
> CDB while a simple capable() check does not.

That was my guess, too. I just though it would be more consistent otherwise.
Its not a big thing, really.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
Martin K. Petersen Oct. 23, 2017, 6:16 a.m. UTC | #9
Benjamin,

>> Not sure it's worth it especially now that Martin has merged the patch.
>
> He did? I only saw a mail that he picked patches 2-5. So all the bsg
> changes are still open I think.

Yes, I expected the bsg bits to go through Jens' tree.
Christoph Hellwig Oct. 23, 2017, 6:29 a.m. UTC | #10
On Mon, Oct 23, 2017 at 02:16:03AM -0400, Martin K. Petersen wrote:
> 
> Benjamin,
> 
> >> Not sure it's worth it especially now that Martin has merged the patch.
> >
> > He did? I only saw a mail that he picked patches 2-5. So all the bsg
> > changes are still open I think.
> 
> Yes, I expected the bsg bits to go through Jens' tree.

Ok, then I misremembered it, and we'll have to delay the remaining
patches until the next merge window, as they depend on the previous
ones.
Martin K. Petersen Oct. 23, 2017, 7:17 a.m. UTC | #11
Christoph,

>> Yes, I expected the bsg bits to go through Jens' tree.
>
> Ok, then I misremembered it, and we'll have to delay the remaining
> patches until the next merge window, as they depend on the previous
> ones.

I don't mind taking them through SCSI if Jens agrees.
Benjamin Block Oct. 24, 2017, 4:58 p.m. UTC | #12
> +static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
> +{
> +	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> +	int ret = 0;
> +
> +	/*
> +	 * The assignments below don't make much sense, but are kept for
> +	 * bug by bug backwards compatibility:
> +	 */
> +	hdr->device_status = job->result & 0xff;
> +	hdr->transport_status = host_byte(job->result);
> +	hdr->driver_status = driver_byte(job->result);
> +	hdr->info = 0;
> +	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> +		hdr->info |= SG_INFO_CHECK;
> +	hdr->response_len = 0;
> +
> +	if (job->result < 0) {
> +		/* we're only returning the result field in the reply */
> +		job->reply_len = sizeof(u32);
> +		ret = job->result;
> +	}
> +
> +	if (job->reply_len && hdr->response) {
> +		int len = min(hdr->max_response_len, job->reply_len);
> +
> +		if (copy_to_user(ptr64(hdr->response), job->reply, len))
> +			ret = -EFAULT;
> +		else
> +			hdr->response_len = len;
> +	}
> +
> +	/* we assume all request payload was transferred, residual == 0 */
> +	hdr->dout_resid = 0;
> +
> +	if (rq->next_rq) {
> +		unsigned int rsp_len = blk_rq_bytes(rq->next_rq);
> +
> +		if (WARN_ON(job->reply_payload_rcv_len > rsp_len))

This gives my a lot of new Warnings when running my tests on zFCP, non
of which happen when I run on 4.13.

After browsing the source some, I figured this is because in comparison
to the old code, blk_rq_bytes() is now called after we finished the
blk-request already and blk_update_bidi_request()+blk_update_request()
was already called. This will update rq->__data_len, and thus the call
here to get rsp_len will get a wrong value. Thus the warnings, and the
following calculation is actually wrong.

I figure you can just replace this call for blk_rq_bytes() with
'job->reply_payload.payload_len'. Its essentially the same value, but
the later is not changed after the job is committed as far as I could see
in the source. Driver makes use of it, but only reading as far as I
could make out after browsing the code for a bit.

I did a quick test with that change in place and that seems to work fine
now. As far as my tests go, they behave as they did before.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block

> +			hdr->din_resid = 0;
> +		else
> +			hdr->din_resid = rsp_len - job->reply_payload_rcv_len;
> +	} else {
> +		hdr->din_resid = 0;
> +	}
> +
> +	return ret;
> +}
> +

--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Jens Axboe Oct. 24, 2017, 5:46 p.m. UTC | #13
On 10/23/2017 01:17 AM, Martin K. Petersen wrote:
> 
> Christoph,
> 
>>> Yes, I expected the bsg bits to go through Jens' tree.
>>
>> Ok, then I misremembered it, and we'll have to delay the remaining
>> patches until the next merge window, as they depend on the previous
>> ones.
> 
> I don't mind taking them through SCSI if Jens agrees.

I'm fine with that, as long as the last issue Benjamin brought up has
been fixed up.
diff mbox

Patch

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 6299526bd2c3..99b459e21782 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -27,6 +27,94 @@ 
 #include <linux/bsg-lib.h>
 #include <linux/export.h>
 #include <scsi/scsi_cmnd.h>
+#include <scsi/sg.h>
+
+#define ptr64(val) ((void __user *)(uintptr_t)(val))
+
+static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
+{
+	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
+	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
+		return -EINVAL;
+	if (!capable(CAP_SYS_RAWIO))
+		return -EPERM;
+	return 0;
+}
+
+static int bsg_transport_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
+		fmode_t mode)
+{
+	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
+
+	job->request_len = hdr->request_len;
+	job->request = memdup_user(ptr64(hdr->request), hdr->request_len);
+	if (IS_ERR(job->request))
+		return PTR_ERR(job->request);
+	return 0;
+}
+
+static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
+{
+	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
+	int ret = 0;
+
+	/*
+	 * The assignments below don't make much sense, but are kept for
+	 * bug by bug backwards compatibility:
+	 */
+	hdr->device_status = job->result & 0xff;
+	hdr->transport_status = host_byte(job->result);
+	hdr->driver_status = driver_byte(job->result);
+	hdr->info = 0;
+	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
+		hdr->info |= SG_INFO_CHECK;
+	hdr->response_len = 0;
+
+	if (job->result < 0) {
+		/* we're only returning the result field in the reply */
+		job->reply_len = sizeof(u32);
+		ret = job->result;
+	}
+
+	if (job->reply_len && hdr->response) {
+		int len = min(hdr->max_response_len, job->reply_len);
+
+		if (copy_to_user(ptr64(hdr->response), job->reply, len))
+			ret = -EFAULT;
+		else
+			hdr->response_len = len;
+	}
+
+	/* we assume all request payload was transferred, residual == 0 */
+	hdr->dout_resid = 0;
+
+	if (rq->next_rq) {
+		unsigned int rsp_len = blk_rq_bytes(rq->next_rq);
+
+		if (WARN_ON(job->reply_payload_rcv_len > rsp_len))
+			hdr->din_resid = 0;
+		else
+			hdr->din_resid = rsp_len - job->reply_payload_rcv_len;
+	} else {
+		hdr->din_resid = 0;
+	}
+
+	return ret;
+}
+
+static void bsg_transport_free_rq(struct request *rq)
+{
+	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
+
+	kfree(job->request);
+}
+
+static const struct bsg_ops bsg_transport_ops = {
+	.check_proto		= bsg_transport_check_proto,
+	.fill_hdr		= bsg_transport_fill_hdr,
+	.complete_rq		= bsg_transport_complete_rq,
+	.free_rq		= bsg_transport_free_rq,
+};
 
 /**
  * bsg_teardown_job - routine to teardown a bsg job
@@ -68,27 +156,9 @@  EXPORT_SYMBOL_GPL(bsg_job_get);
 void bsg_job_done(struct bsg_job *job, int result,
 		  unsigned int reply_payload_rcv_len)
 {
-	struct request *req = blk_mq_rq_from_pdu(job);
-	struct request *rsp = req->next_rq;
-	int err;
-
-	err = job->sreq.result = result;
-	if (err < 0)
-		/* we're only returning the result field in the reply */
-		job->sreq.sense_len = sizeof(u32);
-	else
-		job->sreq.sense_len = job->reply_len;
-	/* we assume all request payload was transferred, residual == 0 */
-	job->sreq.resid_len = 0;
-
-	if (rsp) {
-		WARN_ON(reply_payload_rcv_len > scsi_req(rsp)->resid_len);
-
-		/* set reply (bidi) residual */
-		scsi_req(rsp)->resid_len -=
-			min(reply_payload_rcv_len, scsi_req(rsp)->resid_len);
-	}
-	blk_complete_request(req);
+	job->result = result;
+	job->reply_payload_rcv_len = reply_payload_rcv_len;
+	blk_complete_request(blk_mq_rq_from_pdu(job));
 }
 EXPORT_SYMBOL_GPL(bsg_job_done);
 
@@ -113,7 +183,6 @@  static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
 	if (!buf->sg_list)
 		return -ENOMEM;
 	sg_init_table(buf->sg_list, req->nr_phys_segments);
-	scsi_req(req)->resid_len = blk_rq_bytes(req);
 	buf->sg_cnt = blk_rq_map_sg(req->q, req, buf->sg_list);
 	buf->payload_len = blk_rq_bytes(req);
 	return 0;
@@ -124,16 +193,13 @@  static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
  * @dev: device that is being sent the bsg request
  * @req: BSG request that needs a job structure
  */
-static int bsg_prepare_job(struct device *dev, struct request *req)
+static bool bsg_prepare_job(struct device *dev, struct request *req)
 {
 	struct request *rsp = req->next_rq;
-	struct scsi_request *rq = scsi_req(req);
 	struct bsg_job *job = blk_mq_rq_to_pdu(req);
 	int ret;
 
 	job->timeout = req->timeout;
-	job->request = rq->cmd;
-	job->request_len = rq->cmd_len;
 
 	if (req->bio) {
 		ret = bsg_map_buffer(&job->request_payload, req);
@@ -149,12 +215,13 @@  static int bsg_prepare_job(struct device *dev, struct request *req)
 	/* take a reference for the request */
 	get_device(job->dev);
 	kref_init(&job->kref);
-	return 0;
+	return true;
 
 failjob_rls_rqst_payload:
 	kfree(job->request_payload.sg_list);
 failjob_rls_job:
-	return -ENOMEM;
+	job->result = -ENOMEM;
+	return false;
 }
 
 /**
@@ -183,9 +250,7 @@  static void bsg_request_fn(struct request_queue *q)
 			break;
 		spin_unlock_irq(q->queue_lock);
 
-		ret = bsg_prepare_job(dev, req);
-		if (ret) {
-			scsi_req(req)->result = ret;
+		if (!bsg_prepare_job(dev, req)) {
 			blk_end_request_all(req, BLK_STS_OK);
 			spin_lock_irq(q->queue_lock);
 			continue;
@@ -202,46 +267,34 @@  static void bsg_request_fn(struct request_queue *q)
 	spin_lock_irq(q->queue_lock);
 }
 
+/* called right after the request is allocated for the request_queue */
 static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
 {
 	struct bsg_job *job = blk_mq_rq_to_pdu(req);
-	struct scsi_request *sreq = &job->sreq;
-
-	/* called right after the request is allocated for the request_queue */
 
-	sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
-	if (!sreq->sense)
+	job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
+	if (!job->reply)
 		return -ENOMEM;
-
 	return 0;
 }
 
+/* called right before the request is given to the request_queue user */
 static void bsg_initialize_rq(struct request *req)
 {
 	struct bsg_job *job = blk_mq_rq_to_pdu(req);
-	struct scsi_request *sreq = &job->sreq;
-	void *sense = sreq->sense;
-
-	/* called right before the request is given to the request_queue user */
+	void *reply = job->reply;
 
 	memset(job, 0, sizeof(*job));
-
-	scsi_req_init(sreq);
-
-	sreq->sense = sense;
-	sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
-
-	job->reply = sense;
-	job->reply_len = sreq->sense_len;
+	job->reply = reply;
+	job->reply_len = SCSI_SENSE_BUFFERSIZE;
 	job->dd_data = job + 1;
 }
 
 static void bsg_exit_rq(struct request_queue *q, struct request *req)
 {
 	struct bsg_job *job = blk_mq_rq_to_pdu(req);
-	struct scsi_request *sreq = &job->sreq;
 
-	kfree(sreq->sense);
+	kfree(job->reply);
 }
 
 /**
@@ -274,11 +327,10 @@  struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
 	q->queuedata = dev;
 	q->bsg_job_fn = job_fn;
 	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
-	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
 	blk_queue_softirq_done(q, bsg_softirq_done);
 	blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
 
-	ret = bsg_register_queue(q, dev, name, release);
+	ret = bsg_register_queue(q, dev, name, &bsg_transport_ops, release);
 	if (ret) {
 		printk(KERN_ERR "%s: bsg interface failed to "
 		       "initialize - register queue\n", dev->kobj.name);
diff --git a/block/bsg.c b/block/bsg.c
index 452f94f1c5d4..04407cedff09 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -135,113 +135,124 @@  static inline struct hlist_head *bsg_dev_idx_hash(int index)
 	return &bsg_device_list[index & (BSG_LIST_ARRAY_SIZE - 1)];
 }
 
-static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
-				struct sg_io_v4 *hdr, struct bsg_device *bd,
-				fmode_t mode)
+#define ptr64(val) ((void __user *)(uintptr_t)(val))
+
+static int bsg_scsi_check_proto(struct sg_io_v4 *hdr)
+{
+	if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
+	    hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_CMD)
+		return -EINVAL;
+	return 0;
+}
+
+static int bsg_scsi_fill_hdr(struct request *rq, struct sg_io_v4 *hdr,
+		fmode_t mode)
 {
-	struct scsi_request *req = scsi_req(rq);
+	struct scsi_request *sreq = scsi_req(rq);
 
-	if (hdr->request_len > BLK_MAX_CDB) {
-		req->cmd = kzalloc(hdr->request_len, GFP_KERNEL);
-		if (!req->cmd)
+	sreq->cmd_len = hdr->request_len;
+	if (sreq->cmd_len > BLK_MAX_CDB) {
+		sreq->cmd = kzalloc(sreq->cmd_len, GFP_KERNEL);
+		if (!sreq->cmd)
 			return -ENOMEM;
 	}
 
-	if (copy_from_user(req->cmd, (void __user *)(unsigned long)hdr->request,
-			   hdr->request_len))
+	if (copy_from_user(sreq->cmd, ptr64(hdr->request), sreq->cmd_len))
 		return -EFAULT;
-
-	if (hdr->subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
-		if (blk_verify_command(req->cmd, mode))
-			return -EPERM;
-	} else if (!capable(CAP_SYS_RAWIO))
+	if (blk_verify_command(sreq->cmd, mode))
 		return -EPERM;
-
-	/*
-	 * fill in request structure
-	 */
-	req->cmd_len = hdr->request_len;
-
-	rq->timeout = msecs_to_jiffies(hdr->timeout);
-	if (!rq->timeout)
-		rq->timeout = q->sg_timeout;
-	if (!rq->timeout)
-		rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
-	if (rq->timeout < BLK_MIN_SG_TIMEOUT)
-		rq->timeout = BLK_MIN_SG_TIMEOUT;
-
 	return 0;
 }
 
-/*
- * Check if sg_io_v4 from user is allowed and valid
- */
-static int
-bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
+static int bsg_scsi_complete_rq(struct request *rq, struct sg_io_v4 *hdr)
 {
+	struct scsi_request *sreq = scsi_req(rq);
 	int ret = 0;
 
-	if (hdr->guard != 'Q')
-		return -EINVAL;
+	/*
+	 * fill in all the output members
+	 */
+	hdr->device_status = sreq->result & 0xff;
+	hdr->transport_status = host_byte(sreq->result);
+	hdr->driver_status = driver_byte(sreq->result);
+	hdr->info = 0;
+	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
+		hdr->info |= SG_INFO_CHECK;
+	hdr->response_len = 0;
 
-	switch (hdr->protocol) {
-	case BSG_PROTOCOL_SCSI:
-		switch (hdr->subprotocol) {
-		case BSG_SUB_PROTOCOL_SCSI_CMD:
-		case BSG_SUB_PROTOCOL_SCSI_TRANSPORT:
-			break;
-		default:
-			ret = -EINVAL;
-		}
-		break;
-	default:
-		ret = -EINVAL;
+	if (sreq->sense_len && hdr->response) {
+		int len = min_t(unsigned int, hdr->max_response_len,
+					sreq->sense_len);
+
+		if (!copy_to_user(ptr64(hdr->response), sreq->sense, len))
+			hdr->response_len = len;
+		else
+			ret = -EFAULT;
+	}
+
+	if (rq->next_rq) {
+		hdr->dout_resid = sreq->resid_len;
+		hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
+	} else if (rq_data_dir(rq) == READ) {
+		hdr->din_resid = sreq->resid_len;
+	} else {
+		hdr->dout_resid = sreq->resid_len;
 	}
 
-	*op = hdr->dout_xfer_len ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN;
 	return ret;
 }
 
-/*
- * map sg_io_v4 to a request.
- */
+static void bsg_scsi_free_rq(struct request *rq)
+{
+	scsi_req_free_cmd(scsi_req(rq));
+}
+
+static const struct bsg_ops bsg_scsi_ops = {
+	.check_proto		= bsg_scsi_check_proto,
+	.fill_hdr		= bsg_scsi_fill_hdr,
+	.complete_rq		= bsg_scsi_complete_rq,
+	.free_rq		= bsg_scsi_free_rq,
+};
+
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t mode)
+bsg_map_hdr(struct request_queue *q, struct sg_io_v4 *hdr, fmode_t mode)
 {
-	struct request_queue *q = bd->queue;
 	struct request *rq, *next_rq = NULL;
 	int ret;
-	unsigned int op, dxfer_len;
-	void __user *dxferp = NULL;
-	struct bsg_class_device *bcd = &q->bsg_dev;
 
-	/* if the LLD has been removed then the bsg_unregister_queue will
-	 * eventually be called and the class_dev was freed, so we can no
-	 * longer use this request_queue. Return no such address.
-	 */
-	if (!bcd->class_dev)
+	if (!q->bsg_dev.class_dev)
 		return ERR_PTR(-ENXIO);
 
 	dprintk("map hdr %llx/%u %llx/%u\n", (unsigned long long) hdr->dout_xferp,
 		hdr->dout_xfer_len, (unsigned long long) hdr->din_xferp,
 		hdr->din_xfer_len);
 
-	ret = bsg_validate_sgv4_hdr(hdr, &op);
+	if (hdr->guard != 'Q')
+		return ERR_PTR(-EINVAL);
+
+	ret = q->bsg_dev.ops->check_proto(hdr);
 	if (ret)
 		return ERR_PTR(ret);
 
-	/*
-	 * map scatter-gather elements separately and string them to request
-	 */
-	rq = blk_get_request(q, op, GFP_KERNEL);
+	rq = blk_get_request(q, hdr->dout_xfer_len ?
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN,
+			GFP_KERNEL);
 	if (IS_ERR(rq))
 		return rq;
 
-	ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, mode);
+	ret = q->bsg_dev.ops->fill_hdr(rq, hdr, mode);
 	if (ret)
 		goto out;
 
-	if (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len) {
+	rq->timeout = msecs_to_jiffies(hdr->timeout);
+	if (!rq->timeout)
+		rq->timeout = rq->q->sg_timeout;
+	if (!rq->timeout)
+		rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
+	if (rq->timeout < BLK_MIN_SG_TIMEOUT)
+		rq->timeout = BLK_MIN_SG_TIMEOUT;
+
+	if (hdr->dout_xfer_len && hdr->din_xfer_len) {
 		if (!test_bit(QUEUE_FLAG_BIDI, &q->queue_flags)) {
 			ret = -EOPNOTSUPP;
 			goto out;
@@ -250,42 +261,39 @@  bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t mode)
 		next_rq = blk_get_request(q, REQ_OP_SCSI_IN, GFP_KERNEL);
 		if (IS_ERR(next_rq)) {
 			ret = PTR_ERR(next_rq);
-			next_rq = NULL;
 			goto out;
 		}
-		rq->next_rq = next_rq;
 
-		dxferp = (void __user *)(unsigned long)hdr->din_xferp;
-		ret =  blk_rq_map_user(q, next_rq, NULL, dxferp,
+		rq->next_rq = next_rq;
+		ret = blk_rq_map_user(q, next_rq, NULL, ptr64(hdr->din_xferp),
 				       hdr->din_xfer_len, GFP_KERNEL);
 		if (ret)
-			goto out;
+			goto out_free_nextrq;
 	}
 
 	if (hdr->dout_xfer_len) {
-		dxfer_len = hdr->dout_xfer_len;
-		dxferp = (void __user *)(unsigned long)hdr->dout_xferp;
+		ret = blk_rq_map_user(q, rq, NULL, ptr64(hdr->dout_xferp),
+				hdr->dout_xfer_len, GFP_KERNEL);
 	} else if (hdr->din_xfer_len) {
-		dxfer_len = hdr->din_xfer_len;
-		dxferp = (void __user *)(unsigned long)hdr->din_xferp;
-	} else
-		dxfer_len = 0;
-
-	if (dxfer_len) {
-		ret = blk_rq_map_user(q, rq, NULL, dxferp, dxfer_len,
-				      GFP_KERNEL);
-		if (ret)
-			goto out;
+		ret = blk_rq_map_user(q, rq, NULL, ptr64(hdr->din_xferp),
+				hdr->din_xfer_len, GFP_KERNEL);
+	} else {
+		ret = blk_rq_map_user(q, rq, NULL, NULL, 0, GFP_KERNEL);
 	}
 
+	if (ret)
+		goto out_unmap_nextrq;
 	return rq;
+
+out_unmap_nextrq:
+	if (rq->next_rq)
+		blk_rq_unmap_user(rq->next_rq->bio);
+out_free_nextrq:
+	if (rq->next_rq)
+		blk_put_request(rq->next_rq);
 out:
-	scsi_req_free_cmd(scsi_req(rq));
+	q->bsg_dev.ops->free_rq(rq);
 	blk_put_request(rq);
-	if (next_rq) {
-		blk_rq_unmap_user(next_rq->bio);
-		blk_put_request(next_rq);
-	}
 	return ERR_PTR(ret);
 }
 
@@ -387,56 +395,20 @@  static struct bsg_command *bsg_get_done_cmd(struct bsg_device *bd)
 static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
 				    struct bio *bio, struct bio *bidi_bio)
 {
-	struct scsi_request *req = scsi_req(rq);
-	int ret = 0;
+	int ret;
 
 	dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
-	/*
-	 * fill in all the output members
-	 */
-	hdr->device_status = req->result & 0xff;
-	hdr->transport_status = host_byte(req->result);
-	hdr->driver_status = driver_byte(req->result);
-	hdr->info = 0;
-	if (hdr->device_status || hdr->transport_status || hdr->driver_status)
-		hdr->info |= SG_INFO_CHECK;
-	hdr->response_len = 0;
 
-	if (req->sense_len && hdr->response) {
-		int len = min_t(unsigned int, hdr->max_response_len,
-					req->sense_len);
-
-		ret = copy_to_user((void __user *)(unsigned long)hdr->response,
-				   req->sense, len);
-		if (!ret)
-			hdr->response_len = len;
-		else
-			ret = -EFAULT;
-	}
+	ret = rq->q->bsg_dev.ops->complete_rq(rq, hdr);
 
 	if (rq->next_rq) {
-		hdr->dout_resid = req->resid_len;
-		hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
 		blk_rq_unmap_user(bidi_bio);
 		blk_put_request(rq->next_rq);
-	} else if (rq_data_dir(rq) == READ)
-		hdr->din_resid = req->resid_len;
-	else
-		hdr->dout_resid = req->resid_len;
-
-	/*
-	 * If the request generated a negative error number, return it
-	 * (providing we aren't already returning an error); if it's
-	 * just a protocol response (i.e. non negative), that gets
-	 * processed above.
-	 */
-	if (!ret && req->result < 0)
-		ret = req->result;
+	}
 
 	blk_rq_unmap_user(bio);
-	scsi_req_free_cmd(req);
+	rq->q->bsg_dev.ops->free_rq(rq);
 	blk_put_request(rq);
-
 	return ret;
 }
 
@@ -618,7 +590,7 @@  static int __bsg_write(struct bsg_device *bd, const char __user *buf,
 		/*
 		 * get a request, fill in the blanks, and add to request queue
 		 */
-		rq = bsg_map_hdr(bd, &bc->hdr, mode);
+		rq = bsg_map_hdr(bd->queue, &bc->hdr, mode);
 		if (IS_ERR(rq)) {
 			ret = PTR_ERR(rq);
 			rq = NULL;
@@ -748,11 +720,6 @@  static struct bsg_device *bsg_add_device(struct inode *inode,
 	unsigned char buf[32];
 #endif
 
-	if (!blk_queue_scsi_passthrough(rq)) {
-		WARN_ONCE(true, "Attempt to register a non-SCSI queue\n");
-		return ERR_PTR(-EINVAL);
-	}
-
 	if (!blk_get_queue(rq))
 		return ERR_PTR(-ENXIO);
 
@@ -913,7 +880,7 @@  static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		if (copy_from_user(&hdr, uarg, sizeof(hdr)))
 			return -EFAULT;
 
-		rq = bsg_map_hdr(bd, &hdr, file->f_mode);
+		rq = bsg_map_hdr(bd->queue, &hdr, file->f_mode);
 		if (IS_ERR(rq))
 			return PTR_ERR(rq);
 
@@ -965,7 +932,8 @@  void bsg_unregister_queue(struct request_queue *q)
 EXPORT_SYMBOL_GPL(bsg_unregister_queue);
 
 int bsg_register_queue(struct request_queue *q, struct device *parent,
-		       const char *name, void (*release)(struct device *))
+		const char *name, const struct bsg_ops *ops,
+		void (*release)(struct device *))
 {
 	struct bsg_class_device *bcd;
 	dev_t dev;
@@ -1002,6 +970,7 @@  int bsg_register_queue(struct request_queue *q, struct device *parent,
 	bcd->queue = q;
 	bcd->parent = get_device(parent);
 	bcd->release = release;
+	bcd->ops = ops;
 	kref_init(&bcd->ref);
 	dev = MKDEV(bsg_major, bcd->minor);
 	class_dev = device_create(bsg_class, parent, dev, NULL, "%s", devname);
@@ -1029,7 +998,17 @@  int bsg_register_queue(struct request_queue *q, struct device *parent,
 	mutex_unlock(&bsg_mutex);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(bsg_register_queue);
+
+int bsg_scsi_register_queue(struct request_queue *q, struct device *parent)
+{
+	if (!blk_queue_scsi_passthrough(q)) {
+		WARN_ONCE(true, "Attempt to register a non-SCSI queue\n");
+		return -EINVAL;
+	}
+
+	return bsg_register_queue(q, parent, NULL, &bsg_scsi_ops, NULL);
+}
+EXPORT_SYMBOL_GPL(bsg_scsi_register_queue);
 
 static struct cdev bsg_cdev;
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..8a404ff29f9c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2105,8 +2105,6 @@  void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
 	struct device *dev = shost->dma_dev;
 
-	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
-
 	/*
 	 * this limit is imposed by hardware restrictions
 	 */
@@ -2202,6 +2200,7 @@  struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
 	}
 
 	__scsi_init_queue(shost, q);
+	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
 	blk_queue_prep_rq(q, scsi_prep_fn);
 	blk_queue_unprep_rq(q, scsi_unprep_fn);
 	blk_queue_softirq_done(q, scsi_softirq_done);
@@ -2231,6 +2230,7 @@  struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 
 	sdev->request_queue->queuedata = sdev;
 	__scsi_init_queue(sdev->host, sdev->request_queue);
+	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
 	return sdev->request_queue;
 }
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index bf53356f41f0..bfb4e6875643 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1262,8 +1262,7 @@  int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 	transport_add_device(&sdev->sdev_gendev);
 	sdev->is_visible = 1;
 
-	error = bsg_register_queue(rq, &sdev->sdev_gendev, NULL, NULL);
-
+	error = bsg_scsi_register_queue(rq, &sdev->sdev_gendev);
 	if (error)
 		/* we're treating error on bsg register as non-fatal,
 		 * so pretend nothing went wrong */
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 736a1f4f9676..4889bd432382 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -228,7 +228,6 @@  static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy)
 	 */
 	blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
 	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
-	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
 	return 0;
 }
 
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index 08762d297cbd..28a7ccc55c89 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -38,7 +38,6 @@  struct bsg_buffer {
 };
 
 struct bsg_job {
-	struct scsi_request sreq;
 	struct device *dev;
 
 	struct kref kref;
@@ -64,6 +63,9 @@  struct bsg_job {
 	struct bsg_buffer request_payload;
 	struct bsg_buffer reply_payload;
 
+	int result;
+	unsigned int reply_payload_rcv_len;
+
 	void *dd_data;		/* Used for driver-specific storage */
 };
 
diff --git a/include/linux/bsg.h b/include/linux/bsg.h
index 7173f6e9d2dd..ed98946a8324 100644
--- a/include/linux/bsg.h
+++ b/include/linux/bsg.h
@@ -1,33 +1,42 @@ 
-#ifndef BSG_H
-#define BSG_H
+#ifndef _LINUX_BSG_H
+#define _LINUX_BSG_H
 
 #include <uapi/linux/bsg.h>
 
+struct request;
+
+#ifdef CONFIG_BLK_DEV_BSG
+struct bsg_ops {
+	int	(*check_proto)(struct sg_io_v4 *hdr);
+	int	(*fill_hdr)(struct request *rq, struct sg_io_v4 *hdr,
+				fmode_t mode);
+	int	(*complete_rq)(struct request *rq, struct sg_io_v4 *hdr);
+	void	(*free_rq)(struct request *rq);
+};
 
-#if defined(CONFIG_BLK_DEV_BSG)
 struct bsg_class_device {
 	struct device *class_dev;
 	struct device *parent;
 	int minor;
 	struct request_queue *queue;
 	struct kref ref;
+	const struct bsg_ops *ops;
 	void (*release)(struct device *);
 };
 
-extern int bsg_register_queue(struct request_queue *q,
-			      struct device *parent, const char *name,
-			      void (*release)(struct device *));
-extern void bsg_unregister_queue(struct request_queue *);
+int bsg_register_queue(struct request_queue *q, struct device *parent,
+		const char *name, const struct bsg_ops *ops,
+		void (*release)(struct device *));
+int bsg_scsi_register_queue(struct request_queue *q, struct device *parent);
+void bsg_unregister_queue(struct request_queue *);
 #else
-static inline int bsg_register_queue(struct request_queue *q,
-				     struct device *parent, const char *name,
-				     void (*release)(struct device *))
+static inline int bsg_scsi_register_queue(struct request_queue *q,
+		struct device *parent)
 {
 	return 0;
 }
 static inline void bsg_unregister_queue(struct request_queue *q)
 {
 }
-#endif
-
-#endif
+#endif /* CONFIG_BLK_DEV_BSG */
+#endif /* _LINUX_BSG_H */