From patchwork Fri May 24 18:47:51 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960317
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D869933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:24 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D25728A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:24 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 10E7628A22; Fri, 24 May 2019 18:48:24 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 641FC28A24
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1729413AbfEXSsV (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:21 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56312 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1727465AbfEXSsV (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:21 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 41F5E20418A;
        Fri, 24 May 2019 20:48:17 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id p95gFUqIWzb7; Fri, 24 May 2019 20:48:13 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 23B06204165;
        Fri, 24 May 2019 20:48:12 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 01/19] sg: move functions around
Date: Fri, 24 May 2019 14:47:51 -0400
Message-Id: <20190524184809.25121-2-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Move code around so it has the basic ordering: open(), close(),
write(), read(), ioctl(), other system calls (e.g. mmap()),
support code and finally debug code. The change was to put the
write() associated code before the read() code. The write()
system call is associated with submitting SCSI commands (i.e.
writing metadata to the device).  The read() system call is
associated with receiving the responses of earlier submitted
commands.

Helper functions are often placed above their caller to reduce
the number of forward function declarations needed.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 493 ++++++++++++++++++++++++----------------------
 1 file changed, 258 insertions(+), 235 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index bcdc28e5ede7..e20cd0d9e27b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -13,6 +13,7 @@
 
 static int sg_version_num = 30536;	/* 2 digits for each component */
 #define SG_VERSION_STR "3.5.36"
+static char *sg_version_date = "20140603";
 
 /*
  *  D. P. Gilbert (dgilbert@interlog.com), notes:
@@ -47,6 +48,7 @@ static int sg_version_num = 30536;	/* 2 digits for each component */
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
 #include <linux/cred.h> /* for sg_check_file_access() */
+#include <linux/proc_fs.h>
 
 #include "scsi.h"
 #include <scsi/scsi_dbg.h>
@@ -57,12 +59,6 @@ static int sg_version_num = 30536;	/* 2 digits for each component */
 
 #include "scsi_logging.h"
 
-#ifdef CONFIG_SCSI_PROC_FS
-#include <linux/proc_fs.h>
-static char *sg_version_date = "20140603";
-
-static int sg_proc_init(void);
-#endif
 
 #define SG_ALLOW_DIO_DEF 0
 
@@ -173,11 +169,11 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
+/* Declarations of other static functions used before they are defined */
+static int sg_proc_init(void);
 static int sg_start_req(Sg_request *srp, unsigned char *cmd);
 static int sg_finish_rem_req(Sg_request * srp);
 static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
-static ssize_t sg_new_read(Sg_fd * sfp, char __user *buf, size_t count,
-			   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
 			int read_only, int sg_io_owned, Sg_request **o_srp);
@@ -190,7 +186,6 @@ static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp);
 static void sg_remove_sfp(struct kref *);
-static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static Sg_device *sg_get_dev(int dev);
@@ -232,16 +227,6 @@ static int sg_check_file_access(struct file *filp, const char *caller)
 	return 0;
 }
 
-static int sg_allow_access(struct file *filp, unsigned char *cmd)
-{
-	struct sg_fd *sfp = filp->private_data;
-
-	if (sfp->parentdp->device->type == TYPE_SCANNER)
-		return 0;
-
-	return blk_verify_command(cmd, filp->f_mode);
-}
-
 static int
 open_wait(Sg_device *sdp, int flags)
 {
@@ -405,200 +390,12 @@ sg_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static ssize_t
-sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
-{
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
-	int req_pack_id = -1;
-	sg_io_hdr_t *hp;
-	struct sg_header *old_hdr = NULL;
-	int retval = 0;
-
-	/*
-	 * This could cause a response to be stranded. Close the associated
-	 * file descriptor to free up any resources being held.
-	 */
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
-
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
-		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_read: count=%d\n", (int) count));
-
-	if (!access_ok(buf, count))
-		return -EFAULT;
-	if (sfp->force_packid && (count >= SZ_SG_HEADER)) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (!old_hdr)
-			return -ENOMEM;
-		if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
-			retval = -EFAULT;
-			goto free_old_hdr;
-		}
-		if (old_hdr->reply_len < 0) {
-			if (count >= SZ_SG_IO_HDR) {
-				sg_io_hdr_t *new_hdr;
-				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
-				if (!new_hdr) {
-					retval = -ENOMEM;
-					goto free_old_hdr;
-				}
-				retval =__copy_from_user
-				    (new_hdr, buf, SZ_SG_IO_HDR);
-				req_pack_id = new_hdr->pack_id;
-				kfree(new_hdr);
-				if (retval) {
-					retval = -EFAULT;
-					goto free_old_hdr;
-				}
-			}
-		} else
-			req_pack_id = old_hdr->pack_id;
-	}
-	srp = sg_get_rq_mark(sfp, req_pack_id);
-	if (!srp) {		/* now wait on packet to arrive */
-		if (atomic_read(&sdp->detaching)) {
-			retval = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (filp->f_flags & O_NONBLOCK) {
-			retval = -EAGAIN;
-			goto free_old_hdr;
-		}
-		retval = wait_event_interruptible(sfp->read_wait,
-			(atomic_read(&sdp->detaching) ||
-			(srp = sg_get_rq_mark(sfp, req_pack_id))));
-		if (atomic_read(&sdp->detaching)) {
-			retval = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (retval) {
-			/* -ERESTARTSYS as signal hit process */
-			goto free_old_hdr;
-		}
-	}
-	if (srp->header.interface_id != '\0') {
-		retval = sg_new_read(sfp, buf, count, srp);
-		goto free_old_hdr;
-	}
-
-	hp = &srp->header;
-	if (old_hdr == NULL) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (! old_hdr) {
-			retval = -ENOMEM;
-			goto free_old_hdr;
-		}
-	}
-	memset(old_hdr, 0, SZ_SG_HEADER);
-	old_hdr->reply_len = (int) hp->timeout;
-	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
-	old_hdr->pack_id = hp->pack_id;
-	old_hdr->twelve_byte =
-	    ((srp->data.cmd_opcode >= 0xc0) && (12 == hp->cmd_len)) ? 1 : 0;
-	old_hdr->target_status = hp->masked_status;
-	old_hdr->host_status = hp->host_status;
-	old_hdr->driver_status = hp->driver_status;
-	if ((CHECK_CONDITION & hp->masked_status) ||
-	    (DRIVER_SENSE & hp->driver_status))
-		memcpy(old_hdr->sense_buffer, srp->sense_b,
-		       sizeof (old_hdr->sense_buffer));
-	switch (hp->host_status) {
-	/* This setup of 'result' is for backward compatibility and is best
-	   ignored by the user who should use target, host + driver status */
-	case DID_OK:
-	case DID_PASSTHROUGH:
-	case DID_SOFT_ERROR:
-		old_hdr->result = 0;
-		break;
-	case DID_NO_CONNECT:
-	case DID_BUS_BUSY:
-	case DID_TIME_OUT:
-		old_hdr->result = EBUSY;
-		break;
-	case DID_BAD_TARGET:
-	case DID_ABORT:
-	case DID_PARITY:
-	case DID_RESET:
-	case DID_BAD_INTR:
-		old_hdr->result = EIO;
-		break;
-	case DID_ERROR:
-		old_hdr->result = (srp->sense_b[0] == 0 && 
-				  hp->masked_status == GOOD) ? 0 : EIO;
-		break;
-	default:
-		old_hdr->result = EIO;
-		break;
-	}
-
-	/* Now copy the result back to the user buffer.  */
-	if (count >= SZ_SG_HEADER) {
-		if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
-			retval = -EFAULT;
-			goto free_old_hdr;
-		}
-		buf += SZ_SG_HEADER;
-		if (count > old_hdr->reply_len)
-			count = old_hdr->reply_len;
-		if (count > SZ_SG_HEADER) {
-			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
-				retval = -EFAULT;
-				goto free_old_hdr;
-			}
-		}
-	} else
-		count = (old_hdr->result == 0) ? 0 : -EIO;
-	sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	retval = count;
-free_old_hdr:
-	kfree(old_hdr);
-	return retval;
-}
-
-static ssize_t
-sg_new_read(Sg_fd * sfp, char __user *buf, size_t count, Sg_request * srp)
-{
-	sg_io_hdr_t *hp = &srp->header;
-	int err = 0, err2;
-	int len;
-
-	if (count < SZ_SG_IO_HDR) {
-		err = -EINVAL;
-		goto err_out;
-	}
-	hp->sb_len_wr = 0;
-	if ((hp->mx_sb_len > 0) && hp->sbp) {
-		if ((CHECK_CONDITION & hp->masked_status) ||
-		    (DRIVER_SENSE & hp->driver_status)) {
-			int sb_len = SCSI_SENSE_BUFFERSIZE;
-			sb_len = (hp->mx_sb_len > sb_len) ? sb_len : hp->mx_sb_len;
-			len = 8 + (int) srp->sense_b[7];	/* Additional sense length field */
-			len = (len > sb_len) ? sb_len : len;
-			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
-				err = -EFAULT;
-				goto err_out;
-			}
-			hp->sb_len_wr = len;
-		}
-	}
-	if (hp->masked_status || hp->host_status || hp->driver_status)
-		hp->info |= SG_INFO_CHECK;
-	if (copy_to_user(buf, hp, SZ_SG_IO_HDR)) {
-		err = -EFAULT;
-		goto err_out;
-	}
-err_out:
-	err2 = sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	return err ? : err2 ? : count;
-}
-
+/*
+ * write(2) related functions follow. They are shown before read(2) related
+ * functions. That is because SCSI commands/requests are first "written" to
+ * the SCSI device by using write(2), ioctl(SG_IOSUBMIT) or the first half
+ * of the synchronous ioctl(SG_IO) system call.
+ */
 static ssize_t
 sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 {
@@ -710,6 +507,16 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	return (k < 0) ? k : count;
 }
 
+static int sg_allow_access(struct file *filp, unsigned char *cmd)
+{
+	struct sg_fd *sfp = filp->private_data;
+
+	if (sfp->parentdp->device->type == TYPE_SCANNER)
+		return 0;
+
+	return blk_verify_command(cmd, filp->f_mode);
+}
+
 static ssize_t
 sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 		 size_t count, int blocking, int read_only, int sg_io_owned,
@@ -840,6 +647,74 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 	return 0;
 }
 
+
+/*
+ * read(2) related functions follow. They are shown after write(2) related
+ * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
+ * half of the ioctl(SG_IO) share code with read(2).
+ */
+
+static Sg_request *
+sg_get_rq_mark(Sg_fd *sfp, int pack_id)
+{
+	Sg_request *resp;
+	unsigned long iflags;
+
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	list_for_each_entry(resp, &sfp->rq_list, entry) {
+		/* look for requests that are ready + not SG_IO owned */
+		if ((resp->done == 1) && (!resp->sg_io_owned) &&
+		    ((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
+			resp->done = 2;	/* guard against other readers */
+			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+			return resp;
+		}
+	}
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	return NULL;
+}
+
+static ssize_t
+sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
+{
+	sg_io_hdr_t *hp = &srp->header;
+	int err = 0, err2;
+	int len;
+
+	if (count < SZ_SG_IO_HDR) {
+		err = -EINVAL;
+		goto err_out;
+	}
+	hp->sb_len_wr = 0;
+	if ((hp->mx_sb_len > 0) && hp->sbp) {
+		if ((CHECK_CONDITION & hp->masked_status) ||
+		    (DRIVER_SENSE & hp->driver_status)) {
+			int sb_len = SCSI_SENSE_BUFFERSIZE;
+
+			sb_len = (hp->mx_sb_len > sb_len) ? sb_len :
+							    hp->mx_sb_len;
+			/* Additional sense length field */
+			len = 8 + (int) srp->sense_b[7];
+			len = (len > sb_len) ? sb_len : len;
+			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
+				err = -EFAULT;
+				goto err_out;
+			}
+			hp->sb_len_wr = len;
+		}
+	}
+	if (hp->masked_status || hp->host_status || hp->driver_status)
+		hp->info |= SG_INFO_CHECK;
+	if (copy_to_user(buf, hp, SZ_SG_IO_HDR)) {
+		err = -EFAULT;
+		goto err_out;
+	}
+err_out:
+	err2 = sg_finish_rem_req(srp);
+	sg_remove_request(sfp, srp);
+	return err ? : err2 ? : count;
+}
+
 static int srp_done(Sg_fd *sfp, Sg_request *srp)
 {
 	unsigned long flags;
@@ -851,6 +726,168 @@ static int srp_done(Sg_fd *sfp, Sg_request *srp)
 	return ret;
 }
 
+static ssize_t
+sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
+{
+	Sg_device *sdp;
+	Sg_fd *sfp;
+	Sg_request *srp;
+	int req_pack_id = -1;
+	sg_io_hdr_t *hp;
+	struct sg_header *old_hdr = NULL;
+	int retval = 0;
+
+	/*
+	 * This could cause a response to be stranded. Close the associated
+	 * file descriptor to free up any resources being held.
+	 */
+	retval = sg_check_file_access(filp, __func__);
+	if (retval)
+		return retval;
+
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
+				      "%s: count=%d\n", __func__,
+				      (int) count));
+	if (!sdp)
+		return -ENXIO;
+
+	if (!access_ok(buf, count))
+		return -EFAULT;
+	if (sfp->force_packid && (count >= SZ_SG_HEADER)) {
+		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
+		if (!old_hdr)
+			return -ENOMEM;
+		if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
+			retval = -EFAULT;
+			goto free_old_hdr;
+		}
+		if (old_hdr->reply_len < 0) {
+			if (count >= SZ_SG_IO_HDR) {
+				sg_io_hdr_t *new_hdr;
+
+				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
+				if (!new_hdr) {
+					retval = -ENOMEM;
+					goto free_old_hdr;
+				}
+				retval = __copy_from_user
+				    (new_hdr, buf, SZ_SG_IO_HDR);
+				req_pack_id = new_hdr->pack_id;
+				kfree(new_hdr);
+				if (retval) {
+					retval = -EFAULT;
+					goto free_old_hdr;
+				}
+			}
+		} else
+			req_pack_id = old_hdr->pack_id;
+	}
+	srp = sg_get_rq_mark(sfp, req_pack_id);
+	if (!srp) {		/* now wait on packet to arrive */
+		if (atomic_read(&sdp->detaching)) {
+			retval = -ENODEV;
+			goto free_old_hdr;
+		}
+		if (filp->f_flags & O_NONBLOCK) {
+			retval = -EAGAIN;
+			goto free_old_hdr;
+		}
+		retval = wait_event_interruptible(sfp->read_wait,
+			(atomic_read(&sdp->detaching) ||
+			(srp = sg_get_rq_mark(sfp, req_pack_id))));
+		if (atomic_read(&sdp->detaching)) {
+			retval = -ENODEV;
+			goto free_old_hdr;
+		}
+		if (retval) {
+			/* -ERESTARTSYS as signal hit process */
+			goto free_old_hdr;
+		}
+	}
+	if (srp->header.interface_id != '\0') {
+		retval = sg_new_read(sfp, buf, count, srp);
+		goto free_old_hdr;
+	}
+
+	hp = &srp->header;
+	if (old_hdr == NULL) {
+		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
+		if (!old_hdr) {
+			retval = -ENOMEM;
+			goto free_old_hdr;
+		}
+	}
+	memset(old_hdr, 0, SZ_SG_HEADER);
+	old_hdr->reply_len = (int) hp->timeout;
+	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
+	old_hdr->pack_id = hp->pack_id;
+	old_hdr->twelve_byte =
+	    ((srp->data.cmd_opcode >= 0xc0) && (hp->cmd_len == 12)) ? 1 : 0;
+	old_hdr->target_status = hp->masked_status;
+	old_hdr->host_status = hp->host_status;
+	old_hdr->driver_status = hp->driver_status;
+	if ((hp->masked_status & CHECK_CONDITION) ||
+	    (hp->driver_status & DRIVER_SENSE))
+		memcpy(old_hdr->sense_buffer, srp->sense_b,
+		       sizeof(old_hdr->sense_buffer));
+	switch (hp->host_status) {
+	/*
+	 * This setup of 'result' is for backward compatibility and is best
+	 * ignored by the user who should use target, host + driver status
+	 */
+	case DID_OK:
+	case DID_PASSTHROUGH:
+	case DID_SOFT_ERROR:
+		old_hdr->result = 0;
+		break;
+	case DID_NO_CONNECT:
+	case DID_BUS_BUSY:
+	case DID_TIME_OUT:
+		old_hdr->result = EBUSY;
+		break;
+	case DID_BAD_TARGET:
+	case DID_ABORT:
+	case DID_PARITY:
+	case DID_RESET:
+	case DID_BAD_INTR:
+		old_hdr->result = EIO;
+		break;
+	case DID_ERROR:
+		old_hdr->result = (srp->sense_b[0] == 0 &&
+				  hp->masked_status == GOOD) ? 0 : EIO;
+		break;
+	default:
+		old_hdr->result = EIO;
+		break;
+	}
+
+	/* Now copy the result back to the user buffer.  */
+	if (count >= SZ_SG_HEADER) {
+		if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
+			retval = -EFAULT;
+			goto free_old_hdr;
+		}
+		buf += SZ_SG_HEADER;
+		if (count > old_hdr->reply_len)
+			count = old_hdr->reply_len;
+		if (count > SZ_SG_HEADER) {
+			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
+				retval = -EFAULT;
+				goto free_old_hdr;
+			}
+		}
+	} else
+		count = (old_hdr->result == 0) ? 0 : -EIO;
+	sg_finish_rem_req(srp);
+	sg_remove_request(sfp, srp);
+	retval = count;
+free_old_hdr:
+	kfree(old_hdr);
+	return retval;
+}
+
 static int max_sectors_bytes(struct request_queue *q)
 {
 	unsigned int max_sectors = queue_max_sectors(q);
@@ -1669,9 +1706,7 @@ init_sg(void)
 	sg_sysfs_valid = 1;
 	rc = scsi_register_interface(&sg_interface);
 	if (0 == rc) {
-#ifdef CONFIG_SCSI_PROC_FS
 		sg_proc_init();
-#endif				/* CONFIG_SCSI_PROC_FS */
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
@@ -1680,6 +1715,14 @@ init_sg(void)
 	return rc;
 }
 
+#ifndef CONFIG_SCSI_PROC_FS
+static int
+sg_proc_init(void)
+{
+	return 0;
+}
+#endif
+
 static void __exit
 exit_sg(void)
 {
@@ -2069,26 +2112,6 @@ sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
 	sfp->res_in_use = 0;
 }
 
-static Sg_request *
-sg_get_rq_mark(Sg_fd * sfp, int pack_id)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(resp, &sfp->rq_list, entry) {
-		/* look for requests that are ready + not SG_IO owned */
-		if ((1 == resp->done) && (!resp->sg_io_owned) &&
-		    ((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
-			resp->done = 2;	/* guard against other readers */
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-			return resp;
-		}
-	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
-}
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)

From patchwork Fri May 24 18:47:52 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960321
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F4C91390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2EFC528A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 233E528A25; Fri, 24 May 2019 18:48:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32A9328A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:26 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391479AbfEXSsZ (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:25 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56338 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1729597AbfEXSsZ (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:25 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 4CE06204199;
        Fri, 24 May 2019 20:48:22 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id M+kCEl3WpwTN; Fri, 24 May 2019 20:48:17 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 6FD7F204170;
        Fri, 24 May 2019 20:48:13 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 02/19] sg: remove typedefs, type+formatting cleanup
Date: Fri, 24 May 2019 14:47:52 -0400
Message-Id: <20190524184809.25121-3-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Typedefs for structure types are discouraged so those structures
that are private to the driver have had their typedefs removed.
 This also means that most "camel" type variable names (i.e. mixed
case) have been removed.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 396 +++++++++++++++++++++++++---------------------
 1 file changed, 220 insertions(+), 176 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index e20cd0d9e27b..7be3d777dbd4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -3,25 +3,18 @@
  *  History:
  *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com),
  *           to allow user process control of SCSI devices.
- *  Development Sponsored by Killy Corp. NY NY
+ *  Development Sponsored by Killy Corp. NY NY  [1992 ??]
  *
  * Original driver (sg.c):
  *        Copyright (C) 1992 Lawrence Foard
  * Version 2 and 3 extensions to driver:
- *        Copyright (C) 1998 - 2014 Douglas Gilbert
+ *        Copyright (C) 1998 - 2019 Douglas Gilbert
  */
 
 static int sg_version_num = 30536;	/* 2 digits for each component */
 #define SG_VERSION_STR "3.5.36"
 static char *sg_version_date = "20140603";
 
-/*
- *  D. P. Gilbert (dgilbert@interlog.com), notes:
- *      - scsi logging is available via SCSI_LOG_TIMEOUT macros. First
- *        the kernel/module needs to be built with CONFIG_SCSI_LOGGING
- *        (otherwise the macros compile to empty statements).
- *
- */
 #include <linux/module.h>
 
 #include <linux/fs.h>
@@ -91,33 +84,32 @@ static int sg_add_device(struct device *, struct class_interface *);
 static void sg_remove_device(struct device *, struct class_interface *);
 
 static DEFINE_IDR(sg_index_idr);
-static DEFINE_RWLOCK(sg_index_lock);	/* Also used to lock
-							   file descriptor list for device */
+static DEFINE_RWLOCK(sg_index_lock); /* Also used to lock fd list for device */
 
 static struct class_interface sg_interface = {
 	.add_dev        = sg_add_device,
 	.remove_dev     = sg_remove_device,
 };
 
-typedef struct sg_scatter_hold { /* holding area for scsi scatter gather info */
-	unsigned short k_use_sg; /* Count of kernel scatter-gather pieces */
-	unsigned sglist_len; /* size of malloc'd scatter-gather list ++ */
-	unsigned bufflen;	/* Size of (aggregate) data buffer */
+struct sg_scatter_hold { /* holding area for scsi scatter gather info */
+	u16 k_use_sg; /* Count of kernel scatter-gather pieces */
+	unsigned int sglist_len; /* size of malloc'd scatter-gather list ++ */
+	unsigned int bufflen;	/* Size of (aggregate) data buffer */
 	struct page **pages;
 	int page_order;
 	char dio_in_use;	/* 0->indirect IO (or mmap), 1->dio */
-	unsigned char cmd_opcode; /* first byte of command */
-} Sg_scatter_hold;
+	u8 cmd_opcode;		/* first byte of command */
+};
 
 struct sg_device;		/* forward declarations */
 struct sg_fd;
 
-typedef struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
+struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct list_head entry;	/* list entry */
 	struct sg_fd *parentfp;	/* NULL -> not in use */
-	Sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
+	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
 	sg_io_hdr_t header;	/* scsi command+info, see <scsi/sg.h> */
-	unsigned char sense_b[SCSI_SENSE_BUFFERSIZE];
+	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
@@ -126,9 +118,9 @@ typedef struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct request *rq;
 	struct bio *bio;
 	struct execute_work ew;
-} Sg_request;
+};
 
-typedef struct sg_fd {		/* holds the state of a file descriptor */
+struct sg_fd {		/* holds the state of a file descriptor */
 	struct list_head sfd_siblings;  /* protected by device's sfd_lock */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
@@ -136,21 +128,21 @@ typedef struct sg_fd {		/* holds the state of a file descriptor */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
-	Sg_scatter_hold reserve;	/* buffer held for this file descriptor */
+	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
 	struct list_head rq_list; /* head of request list */
 	struct fasync_struct *async_qp;	/* used by asynchronous notification */
-	Sg_request req_arr[SG_MAX_QUEUE];	/* used as singly-linked list */
+	struct sg_request req_arr[SG_MAX_QUEUE];/* use as singly-linked list */
 	char force_packid;	/* 1 -> pack_id input to read(), 0 -> ignored */
 	char cmd_q;		/* 1 -> allow command queuing, 0 -> don't */
-	unsigned char next_cmd_len; /* 0: automatic, >0: use on next write() */
+	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
 	char res_in_use;	/* 1 -> 'reserve' array in use */
 	struct kref f_ref;
 	struct execute_work ew;
-} Sg_fd;
+};
 
-typedef struct sg_device { /* holds the state of each scsi generic device */
+struct sg_device { /* holds the state of each scsi generic device */
 	struct scsi_device *device;
 	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
@@ -165,30 +157,34 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
 	struct kref d_ref;
-} Sg_device;
+};
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(Sg_request *srp, unsigned char *cmd);
-static int sg_finish_rem_req(Sg_request * srp);
-static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
-static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
+static int sg_start_req(struct sg_request *srp, u8 *cmd);
+static int sg_finish_rem_req(struct sg_request *srp);
+static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
+			     int buff_size);
+static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
-			int read_only, int sg_io_owned, Sg_request **o_srp);
-static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
-			   unsigned char *cmnd, int timeout, int blocking);
-static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
-static void sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp);
-static void sg_build_reserve(Sg_fd * sfp, int req_size);
-static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
-static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
-static Sg_fd *sg_add_sfp(Sg_device * sdp);
+			int read_only, int sg_io_owned,
+			struct sg_request **o_srp);
+static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
+			   u8 *cmnd, int timeout, int blocking);
+static int sg_read_oxfer(struct sg_request *srp, char __user *outp,
+			 int num_read_xfer);
+static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
+static void sg_build_reserve(struct sg_fd *sfp, int req_size);
+static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
+			    int size);
+static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
+static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static Sg_request *sg_add_request(Sg_fd * sfp);
-static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
-static Sg_device *sg_get_dev(int dev);
+static struct sg_request *sg_add_request(struct sg_fd *sfp);
+static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
+static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 
 #define SZ_SG_HEADER sizeof(struct sg_header)
@@ -212,7 +208,8 @@ static void sg_device_destroy(struct kref *kref);
  * This function provides protection for the legacy API by restricting the
  * calling context.
  */
-static int sg_check_file_access(struct file *filp, const char *caller)
+static int
+sg_check_file_access(struct file *filp, const char *caller)
 {
 	if (filp->f_cred != current_real_cred()) {
 		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
@@ -228,11 +225,11 @@ static int sg_check_file_access(struct file *filp, const char *caller)
 }
 
 static int
-open_wait(Sg_device *sdp, int flags)
+sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 {
 	int retval = 0;
 
-	if (flags & O_EXCL) {
+	if (o_excl) {
 		while (sdp->open_cnt > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
@@ -263,26 +260,34 @@ open_wait(Sg_device *sdp, int flags)
 	return retval;
 }
 
-/* Returns 0 on success, else a negated errno value */
+/*
+ * Corresponds to the open() system call on sg devices. Implements O_EXCL on
+ * a per device basis using 'open_cnt'. If O_EXCL and O_NONBLOCK and there is
+ * already a sg handle open on this device then it fails with an errno of
+ * EBUSY. Without the O_NONBLOCK flag then this thread enters an interruptible
+ * wait until the other handle(s) are closed.
+ */
 static int
 sg_open(struct inode *inode, struct file *filp)
 {
-	int dev = iminor(inode);
-	int flags = filp->f_flags;
+	bool o_excl;
+	int min_dev = iminor(inode);
+	int op_flags = filp->f_flags;
 	struct request_queue *q;
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	int retval;
 
 	nonseekable_open(inode, filp);
-	if ((flags & O_EXCL) && (O_RDONLY == (flags & O_ACCMODE)))
+	o_excl = !!(op_flags & O_EXCL);
+	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM; /* Can't lock it with read only access */
-	sdp = sg_get_dev(dev);
+	sdp = sg_get_dev(min_dev);
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_open: flags=0x%x\n", flags));
+				      "sg_open: flags=0x%x\n", op_flags));
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
@@ -297,7 +302,7 @@ sg_open(struct inode *inode, struct file *filp)
 	/* scsi_block_when_processing_errors() may block so bypass
 	 * check if O_NONBLOCK. Permits SCSI commands to be issued
 	 * during error recovery. Tread carefully. */
-	if (!((flags & O_NONBLOCK) ||
+	if (!((op_flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
 		retval = -ENXIO;
 		/* we are in error recovery for this device */
@@ -305,8 +310,8 @@ sg_open(struct inode *inode, struct file *filp)
 	}
 
 	mutex_lock(&sdp->open_rel_lock);
-	if (flags & O_NONBLOCK) {
-		if (flags & O_EXCL) {
+	if (op_flags & O_NONBLOCK) {
+		if (o_excl) {
 			if (sdp->open_cnt > 0) {
 				retval = -EBUSY;
 				goto error_mutex_locked;
@@ -318,13 +323,13 @@ sg_open(struct inode *inode, struct file *filp)
 			}
 		}
 	} else {
-		retval = open_wait(sdp, flags);
+		retval = sg_wait_open_event(sdp, o_excl);
 		if (retval) /* -ERESTARTSYS or -ENODEV */
 			goto error_mutex_locked;
 	}
 
 	/* N.B. at this point we are holding the open_rel_lock */
-	if (flags & O_EXCL)
+	if (o_excl)
 		sdp->exclude = true;
 
 	if (sdp->open_cnt < 1) {  /* no existing opens */
@@ -348,7 +353,7 @@ sg_open(struct inode *inode, struct file *filp)
 	return retval;
 
 out_undo:
-	if (flags & O_EXCL) {
+	if (o_excl) {
 		sdp->exclude = false;   /* undo if error */
 		wake_up_interruptible(&sdp->open_wait);
 	}
@@ -366,10 +371,12 @@ sg_open(struct inode *inode, struct file *filp)
 static int
 sg_release(struct inode *inode, struct file *filp)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, "sg_release\n"));
 
@@ -401,20 +408,22 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 {
 	int mxsize, cmd_size, k;
 	int input_size, blocking;
-	unsigned char opcode;
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	u8 opcode;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	struct sg_header old_hdr;
 	sg_io_hdr_t *hp;
-	unsigned char cmnd[SG_MAX_CDB_SIZE];
+	u8 cmnd[SG_MAX_CDB_SIZE];
 	int retval;
 
 	retval = sg_check_file_access(filp, __func__);
 	if (retval)
 		return retval;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
 				      "sg_write: count=%d\n", (int) count));
@@ -467,7 +476,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	}
 	hp = &srp->header;
 	hp->interface_id = '\0';	/* indicator of old interface tunnelled */
-	hp->cmd_len = (unsigned char) cmd_size;
+	hp->cmd_len = (u8) cmd_size;
 	hp->iovec_count = 0;
 	hp->mx_sb_len = 0;
 	if (input_size > 0)
@@ -507,7 +516,8 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	return (k < 0) ? k : count;
 }
 
-static int sg_allow_access(struct file *filp, unsigned char *cmd)
+static int
+sg_allow_access(struct file *filp, u8 *cmd)
 {
 	struct sg_fd *sfp = filp->private_data;
 
@@ -518,14 +528,14 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
 }
 
 static ssize_t
-sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
+sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
 		 size_t count, int blocking, int read_only, int sg_io_owned,
-		 Sg_request **o_srp)
+		 struct sg_request **o_srp)
 {
 	int k;
-	Sg_request *srp;
+	struct sg_request *srp;
 	sg_io_hdr_t *hp;
-	unsigned char cmnd[SG_MAX_CDB_SIZE];
+	u8 cmnd[SG_MAX_CDB_SIZE];
 	int timeout;
 	unsigned long ul_timeout;
 
@@ -591,11 +601,11 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 }
 
 static int
-sg_common_write(Sg_fd * sfp, Sg_request * srp,
-		unsigned char *cmnd, int timeout, int blocking)
+sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
+		u8 *cmnd, int timeout, int blocking)
 {
 	int k, at_head;
-	Sg_device *sdp = sfp->parentdp;
+	struct sg_device *sdp = sfp->parentdp;
 	sg_io_hdr_t *hp = &srp->header;
 
 	srp->data.cmd_opcode = cmnd[0];	/* hold opcode of command */
@@ -654,10 +664,10 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
  * half of the ioctl(SG_IO) share code with read(2).
  */
 
-static Sg_request *
-sg_get_rq_mark(Sg_fd *sfp, int pack_id)
+static struct sg_request *
+sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
 {
-	Sg_request *resp;
+	struct sg_request *resp;
 	unsigned long iflags;
 
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
@@ -675,7 +685,8 @@ sg_get_rq_mark(Sg_fd *sfp, int pack_id)
 }
 
 static ssize_t
-sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
+sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
+	    struct sg_request *srp)
 {
 	sg_io_hdr_t *hp = &srp->header;
 	int err = 0, err2;
@@ -715,7 +726,8 @@ sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
 	return err ? : err2 ? : count;
 }
 
-static int srp_done(Sg_fd *sfp, Sg_request *srp)
+static int
+srp_done(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long flags;
 	int ret;
@@ -729,9 +741,9 @@ static int srp_done(Sg_fd *sfp, Sg_request *srp)
 static ssize_t
 sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	int req_pack_id = -1;
 	sg_io_hdr_t *hp;
 	struct sg_header *old_hdr = NULL;
@@ -888,7 +900,8 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	return retval;
 }
 
-static int max_sectors_bytes(struct request_queue *q)
+static int
+max_sectors_bytes(struct request_queue *q)
 {
 	unsigned int max_sectors = queue_max_sectors(q);
 
@@ -898,9 +911,9 @@ static int max_sectors_bytes(struct request_queue *q)
 }
 
 static void
-sg_fill_request_table(Sg_fd *sfp, sg_req_info_t *rinfo)
+sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 {
-	Sg_request *srp;
+	struct sg_request *srp;
 	int val;
 	unsigned int ms;
 
@@ -936,14 +949,15 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	void __user *p = (void __user *)arg;
 	int __user *ip = p;
 	int result, val, read_only;
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	unsigned long iflags;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
-
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
 				   "sg_ioctl: cmd=0x%x\n", (int) cmd_in));
 	read_only = (O_RDWR != (filp->f_flags & O_ACCMODE));
@@ -1178,13 +1192,16 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-static long sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
+static long
+sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	struct scsi_device *sdev;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 
 	sdev = sdp->device;
@@ -1204,9 +1221,9 @@ static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
 	__poll_t res = 0;
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	int count = 0;
 	unsigned long iflags;
 
@@ -1241,10 +1258,12 @@ sg_poll(struct file *filp, poll_table * wait)
 static int
 sg_fasync(int fd, struct file *filp, int mode)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
 				      "sg_fasync: mode=%d\n", mode));
@@ -1256,13 +1275,21 @@ static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long offset, len, sa;
-	Sg_scatter_hold *rsv_schp;
+	struct sg_scatter_hold *rsv_schp;
 	int k, length;
+	const char *nbp = "==NULL, bad";
 
-	if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
-		return VM_FAULT_SIGBUS;
+	if (!vma) {
+		pr_warn("%s: vma%s\n", __func__, nbp);
+		goto out_err;
+	}
+	sfp = vma->vm_private_data;
+	if (!sfp) {
+		pr_warn("%s: sfp%s\n", __func__, nbp);
+		goto out_err;
+	}
 	rsv_schp = &sfp->reserve;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= rsv_schp->bufflen)
@@ -1285,7 +1312,7 @@ sg_vma_fault(struct vm_fault *vmf)
 		sa += len;
 		offset -= len;
 	}
-
+out_err:
 	return VM_FAULT_SIGBUS;
 }
 
@@ -1296,14 +1323,19 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long req_sz, len, sa;
-	Sg_scatter_hold *rsv_schp;
+	struct sg_scatter_hold *rsv_schp;
 	int k, length;
 	int ret = 0;
 
-	if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
+	if (!filp || !vma)
+		return -ENXIO;
+	sfp = filp->private_data;
+	if (!sfp) {
+		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
+	}
 	req_sz = vma->vm_end - vma->vm_start;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_mmap starting, vm_start=%p, len=%d\n",
@@ -1354,8 +1386,8 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 {
 	struct sg_request *srp = rq->end_io_data;
 	struct scsi_request *req = scsi_req(rq);
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
@@ -1467,21 +1499,18 @@ static struct class *sg_sysfs_class;
 
 static int sg_sysfs_valid = 0;
 
-static Sg_device *
+static struct sg_device *
 sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 {
 	struct request_queue *q = scsidp->request_queue;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	unsigned long iflags;
 	int error;
 	u32 k;
 
-	sdp = kzalloc(sizeof(Sg_device), GFP_KERNEL);
-	if (!sdp) {
-		sdev_printk(KERN_WARNING, scsidp, "%s: kmalloc Sg_device "
-			    "failure\n", __func__);
+	sdp = kzalloc(sizeof(struct sg_device), GFP_KERNEL);
+	if (!sdp)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
@@ -1494,8 +1523,8 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 				    scsidp->type, SG_MAX_DEVS - 1);
 			error = -ENODEV;
 		} else {
-			sdev_printk(KERN_WARNING, scsidp, "%s: idr "
-				    "allocation Sg_device failure: %d\n",
+			sdev_printk(KERN_WARNING, scsidp,
+				    "%s: idr alloc sg_device failure: %d\n",
 				    __func__, error);
 		}
 		goto out_unlock;
@@ -1534,7 +1563,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	struct gendisk *disk;
-	Sg_device *sdp = NULL;
+	struct sg_device *sdp = NULL;
 	struct cdev * cdev = NULL;
 	int error;
 	unsigned long iflags;
@@ -1633,9 +1662,9 @@ static void
 sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
-	Sg_device *sdp = dev_get_drvdata(cl_dev);
+	struct sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	int val;
 
 	if (!sdp)
@@ -1738,22 +1767,22 @@ exit_sg(void)
 }
 
 static int
-sg_start_req(Sg_request *srp, unsigned char *cmd)
+sg_start_req(struct sg_request *srp, u8 *cmd)
 {
 	int res;
 	struct request *rq;
 	struct scsi_request *req;
-	Sg_fd *sfp = srp->parentfp;
+	struct sg_fd *sfp = srp->parentfp;
 	sg_io_hdr_t *hp = &srp->header;
 	int dxfer_len = (int) hp->dxfer_len;
 	int dxfer_dir = hp->dxfer_direction;
 	unsigned int iov_count = hp->iovec_count;
-	Sg_scatter_hold *req_schp = &srp->data;
-	Sg_scatter_hold *rsv_schp = &sfp->reserve;
+	struct sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	struct request_queue *q = sfp->parentdp->device->request_queue;
 	struct rq_map_data *md, map_data;
 	int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
-	unsigned char *long_cmdp = NULL;
+	u8 *long_cmdp = NULL;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_start_req: dxfer_len=%d\n",
@@ -1868,12 +1897,12 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 }
 
 static int
-sg_finish_rem_req(Sg_request *srp)
+sg_finish_rem_req(struct sg_request *srp)
 {
 	int ret = 0;
 
-	Sg_fd *sfp = srp->parentfp;
-	Sg_scatter_hold *req_schp = &srp->data;
+	struct sg_fd *sfp = srp->parentfp;
+	struct sg_scatter_hold *req_schp = &srp->data;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_finish_rem_req: res_used=%d\n",
@@ -1895,7 +1924,8 @@ sg_finish_rem_req(Sg_request *srp)
 }
 
 static int
-sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
+sg_build_sgat(struct sg_scatter_hold *schp, const struct sg_fd *sfp,
+	      int tablesize)
 {
 	int sg_bufflen = tablesize * sizeof(struct page *);
 	gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
@@ -1908,7 +1938,8 @@ sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
 }
 
 static int
-sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
+sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
+		  int buff_size)
 {
 	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
 	int sg_tablesize = sfp->parentdp->sg_tablesize;
@@ -1990,7 +2021,7 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
 }
 
 static void
-sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp)
+sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 			 "sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
@@ -2013,9 +2044,9 @@ sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp)
 }
 
 static int
-sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
+sg_read_oxfer(struct sg_request *srp, char __user *outp, int num_read_xfer)
 {
-	Sg_scatter_hold *schp = &srp->data;
+	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
@@ -2046,9 +2077,9 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
 }
 
 static void
-sg_build_reserve(Sg_fd * sfp, int req_size)
+sg_build_reserve(struct sg_fd *sfp, int req_size)
 {
-	Sg_scatter_hold *schp = &sfp->reserve;
+	struct sg_scatter_hold *schp = &sfp->reserve;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 			 "sg_build_reserve: req_size=%d\n", req_size));
@@ -2064,10 +2095,10 @@ sg_build_reserve(Sg_fd * sfp, int req_size)
 }
 
 static void
-sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
+sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 {
-	Sg_scatter_hold *req_schp = &srp->data;
-	Sg_scatter_hold *rsv_schp = &sfp->reserve;
+	struct sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	int k, num, rem;
 
 	srp->res_used = 1;
@@ -2095,9 +2126,9 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
 }
 
 static void
-sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
+sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 {
-	Sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *req_schp = &srp->data;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
 				      "sg_unlink_reserve: req->k_use_sg=%d\n",
@@ -2113,12 +2144,12 @@ sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
 }
 
 /* always adds to end of list */
-static Sg_request *
-sg_add_request(Sg_fd * sfp)
+static struct sg_request *
+sg_add_request(struct sg_fd *sfp)
 {
 	int k;
 	unsigned long iflags;
-	Sg_request *rp = sfp->req_arr;
+	struct sg_request *rp = sfp->req_arr;
 
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (!list_empty(&sfp->rq_list)) {
@@ -2132,7 +2163,7 @@ sg_add_request(Sg_fd * sfp)
 		if (k >= SG_MAX_QUEUE)
 			goto out_unlock;
 	}
-	memset(rp, 0, sizeof (Sg_request));
+	memset(rp, 0, sizeof(struct sg_request));
 	rp->parentfp = sfp;
 	rp->header.duration = jiffies_to_msecs(jiffies);
 	list_add_tail(&rp->entry, &sfp->rq_list);
@@ -2145,7 +2176,7 @@ sg_add_request(Sg_fd * sfp)
 
 /* Return of 1 for found; 0 for not found */
 static int
-sg_remove_request(Sg_fd * sfp, Sg_request * srp)
+sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long iflags;
 	int res = 0;
@@ -2162,10 +2193,10 @@ sg_remove_request(Sg_fd * sfp, Sg_request * srp)
 	return res;
 }
 
-static Sg_fd *
-sg_add_sfp(Sg_device * sdp)
+static struct sg_fd *
+sg_add_sfp(struct sg_device *sdp)
 {
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long iflags;
 	int bufflen;
 
@@ -2215,13 +2246,13 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 {
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
 	struct sg_device *sdp = sfp->parentdp;
-	Sg_request *srp;
+	struct sg_request *srp;
 	unsigned long iflags;
 
 	/* Cleanup any responses which were never read(). */
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
 	while (!list_empty(&sfp->rq_list)) {
-		srp = list_first_entry(&sfp->rq_list, Sg_request, entry);
+		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
 		sg_finish_rem_req(srp);
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
@@ -2286,12 +2317,13 @@ sg_last_dev(void)
 #endif
 
 /* must be called with sg_index_lock held */
-static Sg_device *sg_lookup_dev(int dev)
+static struct sg_device *
+sg_lookup_dev(int dev)
 {
 	return idr_find(&sg_index_idr, dev);
 }
 
-static Sg_device *
+static struct sg_device *
 sg_get_dev(int dev)
 {
 	struct sg_device *sdp;
@@ -2389,13 +2421,15 @@ sg_proc_init(void)
 }
 
 
-static int sg_proc_seq_show_int(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_int(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", *((int *)s->private));
 	return 0;
 }
 
-static int sg_proc_single_open_adio(struct inode *inode, struct file *file)
+static int
+sg_proc_single_open_adio(struct inode *inode, struct file *file)
 {
 	return single_open(file, sg_proc_seq_show_int, &sg_allow_dio);
 }
@@ -2416,7 +2450,8 @@ sg_proc_write_adio(struct file *filp, const char __user *buffer,
 	return count;
 }
 
-static int sg_proc_single_open_dressz(struct inode *inode, struct file *file)
+static int
+sg_proc_single_open_dressz(struct inode *inode, struct file *file)
 {
 	return single_open(file, sg_proc_seq_show_int, &sg_big_buff);
 }
@@ -2441,14 +2476,16 @@ sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 	return -ERANGE;
 }
 
-static int sg_proc_seq_show_version(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_version(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\t%s [%s]\n", sg_version_num, SG_VERSION_STR,
 		   sg_version_date);
 	return 0;
 }
 
-static int sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 {
 	seq_puts(s, "host\tchan\tid\tlun\ttype\topens\tqdepth\tbusy\tonline\n");
 	return 0;
@@ -2459,7 +2496,8 @@ struct sg_proc_deviter {
 	size_t	max;
 };
 
-static void * dev_seq_start(struct seq_file *s, loff_t *pos)
+static void *
+dev_seq_start(struct seq_file *s, loff_t *pos)
 {
 	struct sg_proc_deviter * it = kmalloc(sizeof(*it), GFP_KERNEL);
 
@@ -2474,7 +2512,8 @@ static void * dev_seq_start(struct seq_file *s, loff_t *pos)
 	return it;
 }
 
-static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
+static void *
+dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
 	struct sg_proc_deviter * it = s->private;
 
@@ -2482,15 +2521,17 @@ static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 	return (it->index < it->max) ? it : NULL;
 }
 
-static void dev_seq_stop(struct seq_file *s, void *v)
+static void
+dev_seq_stop(struct seq_file *s, void *v)
 {
 	kfree(s->private);
 }
 
-static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_dev(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
 
@@ -2513,10 +2554,11 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 	return 0;
 }
 
-static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
 
@@ -2533,11 +2575,12 @@ static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 }
 
 /* must be called while holding sg_index_lock */
-static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
+static void
+sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 {
 	int k, new_interface, blen, usg;
-	Sg_request *srp;
-	Sg_fd *fp;
+	struct sg_request *srp;
+	struct sg_fd *fp;
 	const sg_io_hdr_t *hp;
 	const char * cp;
 	unsigned int ms;
@@ -2596,10 +2639,11 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 	}
 }
 
-static int sg_proc_seq_show_debug(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_debug(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	unsigned long iflags;
 
 	if (it && (0 == it->index))

From patchwork Fri May 24 18:47:53 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960319
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ABD13933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:26 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A6E428A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:26 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 8DE9E28A25; Fri, 24 May 2019 18:48:26 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25B7828A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391473AbfEXSsY (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:24 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56331 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1727465AbfEXSsY (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:24 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 960DC20418A;
        Fri, 24 May 2019 20:48:21 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id dYLBmpPQywUV; Fri, 24 May 2019 20:48:19 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id BC090204172;
        Fri, 24 May 2019 20:48:14 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 03/19] sg: sg_log and is_enabled
Date: Fri, 24 May 2019 14:47:53 -0400
Message-Id: <20190524184809.25121-4-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Replace SCSI_LOG_TIMEOUT macros with SG_LOG macros across the driver.
The definition of SG_LOG calls SCSI_LOG_TIMEOUT if scsi_device
pointer is non-zero, calls pr_info otherwise. Prints the thread id
if current is non-zero, -1 otherwise.

Also replace #if and #ifdef conditional compilations with
the IS_ENABLED macro.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 254 ++++++++++++++++++++++------------------------
 1 file changed, 123 insertions(+), 131 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 7be3d777dbd4..291c278451ef 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -173,8 +173,8 @@ static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
 			struct sg_request **o_srp);
 static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			   u8 *cmnd, int timeout, int blocking);
-static int sg_read_oxfer(struct sg_request *srp, char __user *outp,
-			 int num_read_xfer);
+static int sg_rd_append(struct sg_request *srp, char __user *outp,
+			int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
 static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
@@ -192,6 +192,26 @@ static void sg_device_destroy(struct kref *kref);
 #define SZ_SG_IOVEC sizeof(sg_iovec_t)
 #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
 
+#if IS_ENABLED(CONFIG_SCSI_LOGGING)
+#define SG_LOG(depth, sdp, fmt, a...)					\
+	do {								\
+		char _b[160];						\
+		int _tid = (current ? current->pid : -1);		\
+									\
+		if ((sdp) && (sdp)->disk) {				\
+			snprintf(_b, sizeof(_b), "%s: tid=%d",		\
+				 (sdp)->disk->disk_name, _tid);		\
+			SCSI_LOG_TIMEOUT(depth,				\
+					 sdev_prefix_printk(KERN_INFO,	\
+					 (sdp)->device, _b, fmt, ##a));	\
+		} else							\
+			pr_info("sg: sdp=NULL_or_ERR, " fmt, ##a);	\
+	} while (0)
+#else
+#define SG_LOG(depth, sdp, fmt, a...)
+#endif	/* end of CONFIG_SCSI_LOGGING conditional */
+
+
 #define sg_printk(prefix, sdp, fmt, a...) \
 	sdev_prefix_printk(prefix, (sdp)->device,		\
 			   (sdp)->disk->disk_name, fmt, ##a)
@@ -286,8 +306,9 @@ sg_open(struct inode *inode, struct file *filp)
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_open: flags=0x%x\n", op_flags));
+	SG_LOG(3, sdp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
+	       __func__, min_dev, op_flags, "device open", sdp->open_cnt,
+	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
@@ -376,9 +397,10 @@ sg_release(struct inode *inode, struct file *filp)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sdp, "%s: device open count prior=%d\n", __func__,
+	       sdp->open_cnt);
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, "sg_release\n"));
 
 	mutex_lock(&sdp->open_rel_lock);
 	scsi_autopm_put_device(sdp->device);
@@ -423,10 +445,9 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sdp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_write: count=%d\n", (int) count));
 	if (atomic_read(&sdp->detaching))
 		return -ENODEV;
 	if (!((filp->f_flags & O_NONBLOCK) ||
@@ -447,8 +468,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
 	if (!(srp = sg_add_request(sfp))) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sdp,
-					      "sg_write: queue full\n"));
+		SG_LOG(1, sdp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
 	buf += SZ_SG_HEADER;
@@ -463,11 +483,10 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 			cmd_size = 12;
 	}
 	mutex_unlock(&sfp->f_mutex);
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sdp,
-		"sg_write:   scsi opcode=0x%02x, cmd_size=%d\n", (int) opcode, cmd_size));
-/* Determine buffer size.  */
+	SG_LOG(4, sdp, "%s:   scsi opcode=0x%02x, cmd_size=%d\n", __func__,
+	       (unsigned int)opcode, cmd_size);
 	input_size = count - cmd_size;
-	mxsize = (input_size > old_hdr.reply_len) ? input_size : old_hdr.reply_len;
+	mxsize = max_t(int, input_size, old_hdr.reply_len);
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
 	if (input_size < 0) {
@@ -546,8 +565,7 @@ sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
 
 	sfp->cmd_q = 1;	/* when sg_io_hdr seen, set command queuing on */
 	if (!(srp = sg_add_request(sfp))) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-					      "sg_new_write: queue full\n"));
+		SG_LOG(1, sfp->parentdp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
 	srp->sg_io_owned = sg_io_owned;
@@ -616,17 +634,16 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	hp->host_status = 0;
 	hp->driver_status = 0;
 	hp->resid = 0;
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			"sg_common_write:  scsi opcode=0x%02x, cmd_size=%d\n",
-			(int) cmnd[0], (int) hp->cmd_len));
+	SG_LOG(4, sfp->parentdp, "%s:  opcode=0x%02x, cmd_sz=%d\n", __func__,
+	       (int)cmnd[0], hp->cmd_len);
 
 	if (hp->dxfer_len >= SZ_256M)
 		return -EINVAL;
 
 	k = sg_start_req(srp, cmnd);
 	if (k) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-			"sg_common_write: start_req err=%d\n", k));
+		SG_LOG(1, sfp->parentdp, "%s: start_req err=%d\n", __func__,
+		       k);
 		sg_finish_rem_req(srp);
 		sg_remove_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
@@ -759,9 +776,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "%s: count=%d\n", __func__,
-				      (int) count));
+	SG_LOG(3, sdp, "%s: read() count=%d\n", __func__, (int)count);
 	if (!sdp)
 		return -ENXIO;
 
@@ -885,7 +900,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		if (count > old_hdr->reply_len)
 			count = old_hdr->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
+			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
 				retval = -EFAULT;
 				goto free_old_hdr;
 			}
@@ -958,8 +973,8 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	sdp = sfp->parentdp;
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				   "sg_ioctl: cmd=0x%x\n", (int) cmd_in));
+	SG_LOG(6, sdp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
+	       !!(filp->f_flags & O_NONBLOCK));
 	read_only = (O_RDWR != (filp->f_flags & O_ACCMODE));
 
 	switch (cmd_in) {
@@ -1191,7 +1206,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	return scsi_ioctl(sdp->device, cmd_in, p);
 }
 
-#ifdef CONFIG_COMPAT
+#if IS_ENABLED(CONFIG_COMPAT)
 static long
 sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
@@ -1220,7 +1235,7 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
-	__poll_t res = 0;
+	__poll_t p_res = 0;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
@@ -1237,22 +1252,21 @@ sg_poll(struct file *filp, poll_table * wait)
 	read_lock_irqsave(&sfp->rq_list_lock, iflags);
 	list_for_each_entry(srp, &sfp->rq_list, entry) {
 		/* if any read waiting, flag it */
-		if ((0 == res) && (1 == srp->done) && (!srp->sg_io_owned))
-			res = EPOLLIN | EPOLLRDNORM;
+		if ((p_res == 0) && (srp->done == 1) && (!srp->sg_io_owned))
+			p_res = EPOLLIN | EPOLLRDNORM;
 		++count;
 	}
 	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (atomic_read(&sdp->detaching))
-		res |= EPOLLHUP;
+		p_res |= EPOLLHUP;
 	else if (!sfp->cmd_q) {
 		if (0 == count)
-			res |= EPOLLOUT | EPOLLWRNORM;
+			p_res |= EPOLLOUT | EPOLLWRNORM;
 	} else if (count < SG_MAX_QUEUE)
-		res |= EPOLLOUT | EPOLLWRNORM;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_poll: res=0x%x\n", (__force u32) res));
-	return res;
+		p_res |= EPOLLOUT | EPOLLWRNORM;
+	SG_LOG(3, sdp, "%s: p_res=0x%x\n", __func__, (__force u32)p_res);
+	return p_res;
 }
 
 static int
@@ -1263,11 +1277,9 @@ sg_fasync(int fd, struct file *filp, int mode)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sdp, "%s: mode(%s)\n", __func__, (mode ? "add" : "remove"));
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_fasync: mode=%d\n", mode));
-
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
@@ -1275,6 +1287,7 @@ static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	unsigned long offset, len, sa;
 	struct sg_scatter_hold *rsv_schp;
@@ -1290,14 +1303,13 @@ sg_vma_fault(struct vm_fault *vmf)
 		pr_warn("%s: sfp%s\n", __func__, nbp);
 		goto out_err;
 	}
+	sdp = sfp->parentdp;
 	rsv_schp = &sfp->reserve;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= rsv_schp->bufflen)
 		return VM_FAULT_SIGBUS;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_vma_fault: offset=%lu, scatg=%d\n",
-				      offset, rsv_schp->k_use_sg));
 	sa = vma->vm_start;
+	SG_LOG(3, sdp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
 	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
 		len = vma->vm_end - sa;
@@ -1337,9 +1349,8 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -ENXIO;
 	}
 	req_sz = vma->vm_end - vma->vm_start;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_mmap starting, vm_start=%p, len=%d\n",
-				      (void *) vma->vm_start, (int) req_sz));
+	SG_LOG(3, sfp->parentdp, "%s: vm_start=%p, len=%d\n", __func__,
+	       (void *)vma->vm_start, (int)req_sz);
 	if (vma->vm_pgoff)
 		return -EINVAL;	/* want no offset */
 	rsv_schp = &sfp->reserve;
@@ -1408,10 +1419,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	result = req->result;
 	resid = req->resid_len;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sdp,
-				      "sg_cmd_done: pack_id=%d, res=0x%x\n",
-				      srp->header.pack_id, result));
 	srp->header.resid = resid;
+	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__,
+	       srp->header.pack_id, result);
 	ms = jiffies_to_msecs(jiffies);
 	srp->header.duration = (ms > srp->header.duration) ?
 				(ms - srp->header.duration) : 0;
@@ -1485,7 +1495,7 @@ static const struct file_operations sg_fops = {
 	.write = sg_write,
 	.poll = sg_poll,
 	.unlocked_ioctl = sg_ioctl,
-#ifdef CONFIG_COMPAT
+#if IS_ENABLED(CONFIG_COMPAT)
 	.compat_ioctl = sg_compat_ioctl,
 #endif
 	.open = sg_open,
@@ -1532,7 +1542,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	k = error;
 
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, scsidp,
-					"sg_alloc: dev=%d \n", k));
+			 "%s: dev=%d, sdp=0x%p ++\n", __func__, k, sdp));
 	sprintf(disk->disk_name, "sg%d", k);
 	disk->first_minor = k;
 	sdp->disk = disk;
@@ -1651,8 +1661,7 @@ sg_device_destroy(struct kref *kref)
 	idr_remove(&sg_index_idr, sdp->index);
 	write_unlock_irqrestore(&sg_index_lock, flags);
 
-	SCSI_LOG_TIMEOUT(3,
-		sg_printk(KERN_INFO, sdp, "sg_device_destroy\n"));
+	SG_LOG(3, sdp, "%s: sdp=0x%p --\n", __func__, sdp);
 
 	put_disk(sdp->disk);
 	kfree(sdp);
@@ -1674,8 +1683,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	if (val > 1)
 		return; /* only want to do following once per device */
 
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "%s\n", __func__));
+	SG_LOG(3, sdp, "%s: 0x%p\n", __func__, sdp);
 
 	read_lock_irqsave(&sdp->sfd_lock, iflags);
 	list_for_each_entry(sfp, &sdp->sfds, sfd_siblings) {
@@ -1744,7 +1752,7 @@ init_sg(void)
 	return rc;
 }
 
-#ifndef CONFIG_SCSI_PROC_FS
+#if !IS_ENABLED(CONFIG_SCSI_PROC_FS)
 static int
 sg_proc_init(void)
 {
@@ -1755,9 +1763,8 @@ sg_proc_init(void)
 static void __exit
 exit_sg(void)
 {
-#ifdef CONFIG_SCSI_PROC_FS
-	remove_proc_subtree("scsi/sg", NULL);
-#endif				/* CONFIG_SCSI_PROC_FS */
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
 	class_destroy(sg_sysfs_class);
 	sg_sysfs_valid = 0;
@@ -1772,6 +1779,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	int res;
 	struct request *rq;
 	struct scsi_request *req;
+	struct sg_device *sdp;
 	struct sg_fd *sfp = srp->parentfp;
 	sg_io_hdr_t *hp = &srp->header;
 	int dxfer_len = (int) hp->dxfer_len;
@@ -1781,18 +1789,18 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	struct request_queue *q = sfp->parentdp->device->request_queue;
 	struct rq_map_data *md, map_data;
-	int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+	int r0w = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
 	u8 *long_cmdp = NULL;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_start_req: dxfer_len=%d\n",
-				      dxfer_len));
-
+	sdp = sfp->parentdp;
 	if (hp->cmd_len > BLK_MAX_CDB) {
 		long_cmdp = kzalloc(hp->cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
+		SG_LOG(5, sdp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
+	SG_LOG(4, sdp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
+	       (r0w ? "OUT" : "IN"));
 
 	/*
 	 * NOTE
@@ -1869,7 +1877,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
+		res = import_iovec(r0w, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
@@ -1904,9 +1912,8 @@ sg_finish_rem_req(struct sg_request *srp)
 	struct sg_fd *sfp = srp->parentfp;
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_finish_rem_req: res_used=%d\n",
-				      (int) srp->res_used));
+	SG_LOG(4, sfp->parentdp, "%s: srp=0x%p%s\n", __func__, srp,
+	       (srp->res_used) ? " rsv" : "");
 	if (srp->bio)
 		ret = blk_rq_unmap_user(srp->bio);
 
@@ -1953,9 +1960,8 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 		++blk_size;	/* don't know why */
 	/* round request up to next highest SG_SECTOR_SZ byte boundary */
 	blk_size = ALIGN(blk_size, SG_SECTOR_SZ);
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-		"sg_build_indirect: buff_size=%d, blk_size=%d\n",
-		buff_size, blk_size));
+	SG_LOG(4, sfp->parentdp, "%s: buff_size=%d, blk_size=%d\n",
+	       __func__, buff_size, blk_size);
 
 	/* N.B. ret_sz carried into this block ... */
 	mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
@@ -1994,18 +2000,14 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 				scatter_elem_sz_prev = ret_sz;
 			}
 		}
-
-		SCSI_LOG_TIMEOUT(5, sg_printk(KERN_INFO, sfp->parentdp,
-				 "sg_build_indirect: k=%d, num=%d, ret_sz=%d\n",
-				 k, num, ret_sz));
+		SG_LOG(5, sfp->parentdp, "%s: k=%d, num=%d, ret_sz=%d\n",
+		       __func__, k, num, ret_sz);
 	}		/* end of for loop */
 
 	schp->page_order = order;
 	schp->k_use_sg = k;
-	SCSI_LOG_TIMEOUT(5, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_build_indirect: k_use_sg=%d, rem_sz=%d\n",
-			 k, rem_sz));
-
+	SG_LOG(5, sfp->parentdp, "%s: k_use_sg=%d, order=%d\n", __func__,
+	       k, order);
 	schp->bufflen = blk_size;
 	if (rem_sz > 0)	/* must have failed */
 		return -ENOMEM;
@@ -2023,51 +2025,53 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 static void
 sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
+	SG_LOG(4, sfp->parentdp, "%s: num_sgat=%d\n", __func__,
+	       schp->k_use_sg);
 	if (schp->pages && schp->sglist_len > 0) {
 		if (!schp->dio_in_use) {
 			int k;
 
 			for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
-				SCSI_LOG_TIMEOUT(5,
-					sg_printk(KERN_INFO, sfp->parentdp,
-					"sg_remove_scat: k=%d, pg=0x%p\n",
-					k, schp->pages[k]));
+				SG_LOG(5, sfp->parentdp,
+				       "%s: pg[%d]=0x%p --\n", __func__, k,
+				       schp->pages[k]);
 				__free_pages(schp->pages[k], schp->page_order);
 			}
-
 			kfree(schp->pages);
 		}
 	}
 	memset(schp, 0, sizeof (*schp));
 }
 
+/*
+ * For sg v1 and v2 interface: with a command yielding a data-in buffer, after
+ * it has arrived in kernel memory, this function copies it to the user space,
+ * appended to given struct sg_header object.
+ */
 static int
-sg_read_oxfer(struct sg_request *srp, char __user *outp, int num_read_xfer)
+sg_rd_append(struct sg_request *srp, char __user *outp, int num_xfer)
 {
 	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
-			 "sg_read_oxfer: num_read_xfer=%d\n",
-			 num_read_xfer));
-	if ((!outp) || (num_read_xfer <= 0))
+	SG_LOG(4, srp->parentfp->parentdp, "%s: num_xfer=%d\n", __func__,
+	       num_xfer);
+	if ((!outp) || (num_xfer <= 0))
 		return 0;
 
 	num = 1 << (PAGE_SHIFT + schp->page_order);
 	for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
-		if (num > num_read_xfer) {
+		if (num > num_xfer) {
 			if (__copy_to_user(outp, page_address(schp->pages[k]),
-					   num_read_xfer))
+					   num_xfer))
 				return -EFAULT;
 			break;
 		} else {
 			if (__copy_to_user(outp, page_address(schp->pages[k]),
 					   num))
 				return -EFAULT;
-			num_read_xfer -= num;
-			if (num_read_xfer <= 0)
+			num_xfer -= num;
+			if (num_xfer <= 0)
 				break;
 			outp += num;
 		}
@@ -2081,8 +2085,8 @@ sg_build_reserve(struct sg_fd *sfp, int req_size)
 {
 	struct sg_scatter_hold *schp = &sfp->reserve;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_build_reserve: req_size=%d\n", req_size));
+	SG_LOG(3, sfp ? sfp->parentdp : NULL, "%s: buflen=%d\n", __func__,
+	       req_size);
 	do {
 		if (req_size < PAGE_SIZE)
 			req_size = PAGE_SIZE;
@@ -2102,8 +2106,7 @@ sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 	int k, num, rem;
 
 	srp->res_used = 1;
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_link_reserve: size=%d\n", size));
+	SG_LOG(4, sfp->parentdp, "%s: size=%d\n", __func__, size);
 	rem = size;
 
 	num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
@@ -2121,8 +2124,7 @@ sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 	}
 
 	if (k >= rsv_schp->k_use_sg)
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-				 "sg_link_reserve: BAD size\n"));
+		SG_LOG(1, sfp->parentdp, "%s: BAD size\n", __func__);
 }
 
 static void
@@ -2130,9 +2132,8 @@ sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 {
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
-				      "sg_unlink_reserve: req->k_use_sg=%d\n",
-				      (int) req_schp->k_use_sg));
+	SG_LOG(4, srp->parentfp->parentdp, "%s: req->k_use_sg=%d\n", __func__,
+	       (int)req_schp->k_use_sg);
 	req_schp->k_use_sg = 0;
 	req_schp->bufflen = 0;
 	req_schp->pages = NULL;
@@ -2223,18 +2224,15 @@ sg_add_sfp(struct sg_device *sdp)
 	}
 	list_add_tail(&sfp->sfd_siblings, &sdp->sfds);
 	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_add_sfp: sfp=0x%p\n", sfp));
+	SG_LOG(3, sdp, "%s: sfp=0x%p\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
 	bufflen = min_t(int, sg_big_buff,
 			max_sectors_bytes(sdp->device->request_queue));
 	sg_build_reserve(sfp, bufflen);
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_add_sfp: bufflen=%d, k_use_sg=%d\n",
-				      sfp->reserve.bufflen,
-				      sfp->reserve.k_use_sg));
+	SG_LOG(3, sdp, "%s: bufflen=%d, k_use_sg=%d\n", __func__,
+	       sfp->reserve.bufflen, sfp->reserve.k_use_sg);
 
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
@@ -2260,15 +2258,12 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, sg_printk(KERN_INFO, sdp,
-				"sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-				(int) sfp->reserve.bufflen,
-				(int) sfp->reserve.k_use_sg));
+		SG_LOG(6, sdp, "%s:    bufflen=%d, k_use_sg=%d\n", __func__,
+		       (int)sfp->reserve.bufflen, (int)sfp->reserve.k_use_sg);
 		sg_remove_scat(sfp, &sfp->reserve);
 	}
 
-	SCSI_LOG_TIMEOUT(6, sg_printk(KERN_INFO, sdp,
-			"sg_remove_sfp: sfp=0x%p\n", sfp));
+	SG_LOG(6, sdp, "%s: sfp=0x%p\n", __func__, sfp);
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
@@ -2291,7 +2286,6 @@ sg_remove_sfp(struct kref *kref)
 	schedule_work(&sfp->ew.work);
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
 static int
 sg_idr_max_id(int id, void *p, void *data)
 {
@@ -2303,19 +2297,6 @@ sg_idr_max_id(int id, void *p, void *data)
 	return 0;
 }
 
-static int
-sg_last_dev(void)
-{
-	int k = -1;
-	unsigned long iflags;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	idr_for_each(&sg_index_idr, sg_idr_max_id, &k);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return k + 1;		/* origin 1 */
-}
-#endif
-
 /* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
@@ -2345,7 +2326,7 @@ sg_get_dev(int dev)
 	return sdp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
 static int sg_proc_single_open_adio(struct inode *inode, struct file *file);
@@ -2420,6 +2401,17 @@ sg_proc_init(void)
 	return 0;
 }
 
+static int
+sg_last_dev(void)
+{
+	int k = -1;
+	unsigned long iflags;
+
+	read_lock_irqsave(&sg_index_lock, iflags);
+	idr_for_each(&sg_index_idr, sg_idr_max_id, &k);
+	read_unlock_irqrestore(&sg_index_lock, iflags);
+	return k + 1;		/* origin 1 */
+}
 
 static int
 sg_proc_seq_show_int(struct seq_file *s, void *v)
@@ -2678,7 +2670,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	return 0;
 }
 
-#endif				/* CONFIG_SCSI_PROC_FS */
+#endif				/* CONFIG_SCSI_PROC_FS (~800 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);

From patchwork Fri May 24 18:47:54 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960325
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CA9F933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:30 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BF1D28A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:30 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5FD3128A24; Fri, 24 May 2019 18:48:30 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F3D728A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391618AbfEXSs2 (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:28 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56346 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1727465AbfEXSs1 (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:27 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 12C6920418A;
        Fri, 24 May 2019 20:48:25 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id jXAskaXIe6d6; Fri, 24 May 2019 20:48:21 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id E766E20417E;
        Fri, 24 May 2019 20:48:15 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 04/19] sg: move header to uapi section
Date: Fri, 24 May 2019 14:47:54 -0400
Message-Id: <20190524184809.25121-5-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Move user interface part of scsi/sg.h into the new header file:
include/uapi/scsi/sg.h . Since scsi/sg.h includes the new header,
other code including scsi/sg.h should not be impacted.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/scsi/sg.h      | 268 ++-------------------------------
 include/uapi/scsi/sg.h | 329 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 338 insertions(+), 259 deletions(-)
 create mode 100644 include/uapi/scsi/sg.h

diff --git a/include/scsi/sg.h b/include/scsi/sg.h
index f91bcca604e4..46fc7cbffd78 100644
--- a/include/scsi/sg.h
+++ b/include/scsi/sg.h
@@ -4,271 +4,21 @@
 
 #include <linux/compiler.h>
 
-/*
- * History:
- *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com), to allow user
- *   process control of SCSI devices.
- *  Development Sponsored by Killy Corp. NY NY
- *
- * Original driver (sg.h):
- *       Copyright (C) 1992 Lawrence Foard
- * Version 2 and 3 extensions to driver:
- *	Copyright (C) 1998 - 2014 Douglas Gilbert
- *
- *  Version: 3.5.36 (20140603)
- *  This version is for 2.6 and 3 series kernels.
- *
- * Documentation
- * =============
- * A web site for the SG device driver can be found at:
- *	http://sg.danny.cz/sg  [alternatively check the MAINTAINERS file]
- * The documentation for the sg version 3 driver can be found at:
- *	http://sg.danny.cz/sg/p/sg_v3_ho.html
- * Also see: <kernel_source>/Documentation/scsi/scsi-generic.txt
- *
- * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html
- */
-
 #ifdef __KERNEL__
 extern int sg_big_buff; /* for sysctl */
 #endif
 
+/*
+ * In version 3.9.01 of the sg driver, this file was spilt in two, with the
+ * bulk of the user space interface being placed in the file being included
+ * in the following line.
+ */
+#include <uapi/scsi/sg.h>
 
-typedef struct sg_iovec /* same structure as used by readv() Linux system */
-{                       /* call. It defines one scatter-gather element. */
-    void __user *iov_base;      /* Starting address  */
-    size_t iov_len;             /* Length in bytes  */
-} sg_iovec_t;
-
-
-typedef struct sg_io_hdr
-{
-    int interface_id;           /* [i] 'S' for SCSI generic (required) */
-    int dxfer_direction;        /* [i] data transfer direction  */
-    unsigned char cmd_len;      /* [i] SCSI command length */
-    unsigned char mx_sb_len;    /* [i] max length to write to sbp */
-    unsigned short iovec_count; /* [i] 0 implies no scatter gather */
-    unsigned int dxfer_len;     /* [i] byte count of data transfer */
-    void __user *dxferp;	/* [i], [*io] points to data transfer memory
-					      or scatter gather list */
-    unsigned char __user *cmdp; /* [i], [*i] points to command to perform */
-    void __user *sbp;		/* [i], [*o] points to sense_buffer memory */
-    unsigned int timeout;       /* [i] MAX_UINT->no timeout (unit: millisec) */
-    unsigned int flags;         /* [i] 0 -> default, see SG_FLAG... */
-    int pack_id;                /* [i->o] unused internally (normally) */
-    void __user * usr_ptr;      /* [i->o] unused internally */
-    unsigned char status;       /* [o] scsi status */
-    unsigned char masked_status;/* [o] shifted, masked scsi status */
-    unsigned char msg_status;   /* [o] messaging level data (optional) */
-    unsigned char sb_len_wr;    /* [o] byte count actually written to sbp */
-    unsigned short host_status; /* [o] errors from host adapter */
-    unsigned short driver_status;/* [o] errors from software driver */
-    int resid;                  /* [o] dxfer_len - actual_transferred */
-    unsigned int duration;      /* [o] time taken by cmd (unit: millisec) */
-    unsigned int info;          /* [o] auxiliary information */
-} sg_io_hdr_t;  /* 64 bytes long (on i386) */
-
-#define SG_INTERFACE_ID_ORIG 'S'
-
-/* Use negative values to flag difference from original sg_header structure */
-#define SG_DXFER_NONE (-1)      /* e.g. a SCSI Test Unit Ready command */
-#define SG_DXFER_TO_DEV (-2)    /* e.g. a SCSI WRITE command */
-#define SG_DXFER_FROM_DEV (-3)  /* e.g. a SCSI READ command */
-#define SG_DXFER_TO_FROM_DEV (-4) /* treated like SG_DXFER_FROM_DEV with the
-				   additional property than during indirect
-				   IO the user buffer is copied into the
-				   kernel buffers before the transfer */
-#define SG_DXFER_UNKNOWN (-5)   /* Unknown data direction */
-
-/* following flag values can be "or"-ed together */
-#define SG_FLAG_DIRECT_IO 1     /* default is indirect IO */
-#define SG_FLAG_UNUSED_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
-				/* command block (when <= SCSI_2) */
-#define SG_FLAG_MMAP_IO 4       /* request memory mapped IO */
-#define SG_FLAG_NO_DXFER 0x10000 /* no transfer of kernel buffers to/from */
-				/* user space (debug indirect IO) */
-/* defaults:: for sg driver: Q_AT_HEAD; for block layer: Q_AT_TAIL */
-#define SG_FLAG_Q_AT_TAIL 0x10
-#define SG_FLAG_Q_AT_HEAD 0x20
-
-/* following 'info' values are "or"-ed together */
-#define SG_INFO_OK_MASK 0x1
-#define SG_INFO_OK 0x0          /* no sense, host nor driver "noise" */
-#define SG_INFO_CHECK 0x1       /* something abnormal happened */
-
-#define SG_INFO_DIRECT_IO_MASK 0x6
-#define SG_INFO_INDIRECT_IO 0x0 /* data xfer via kernel buffers (or no xfer) */
-#define SG_INFO_DIRECT_IO 0x2   /* direct IO requested and performed */
-#define SG_INFO_MIXED_IO 0x4    /* part direct, part indirect IO */
-
-
-typedef struct sg_scsi_id { /* used by SG_GET_SCSI_ID ioctl() */
-    int host_no;        /* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */
-    int channel;
-    int scsi_id;        /* scsi id of target device */
-    int lun;
-    int scsi_type;      /* TYPE_... defined in scsi/scsi.h */
-    short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
-    short d_queue_depth;/* device (or adapter) maximum queue length */
-    int unused[2];      /* probably find a good use, set 0 for now */
-} sg_scsi_id_t; /* 32 bytes long on i386 */
-
-typedef struct sg_req_info { /* used by SG_GET_REQUEST_TABLE ioctl() */
-    char req_state;     /* 0 -> not used, 1 -> written, 2 -> ready to read */
-    char orphan;        /* 0 -> normal request, 1 -> from interruped SG_IO */
-    char sg_io_owned;   /* 0 -> complete with read(), 1 -> owned by SG_IO */
-    char problem;       /* 0 -> no problem detected, 1 -> error to report */
-    int pack_id;        /* pack_id associated with request */
-    void __user *usr_ptr;     /* user provided pointer (in new interface) */
-    unsigned int duration; /* millisecs elapsed since written (req_state==1)
-			      or request duration (req_state==2) */
-    int unused;
-} sg_req_info_t; /* 20 bytes long on i386 */
-
-
-/* IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
- [Those that only apply to the SG 2.x drivers are at the end of the file.]
- (_GET_s yield result via 'int *' 3rd argument unless otherwise indicated) */
-
-#define SG_EMULATED_HOST 0x2203 /* true for emulated host adapter (ATAPI) */
-
-/* Used to configure SCSI command transformation layer for ATAPI devices */
-/* Only supported by the ide-scsi driver */
-#define SG_SET_TRANSFORM 0x2204 /* N.B. 3rd arg is not pointer but value: */
-		      /* 3rd arg = 0 to disable transform, 1 to enable it */
-#define SG_GET_TRANSFORM 0x2205
-
-#define SG_SET_RESERVED_SIZE 0x2275  /* request a new reserved buffer size */
-#define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
-
-/* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
-#define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
-/* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
-
-/* Override host setting and always DMA using low memory ( <16MB on i386) */
-#define SG_SET_FORCE_LOW_DMA 0x2279  /* 0-> use adapter setting, 1-> force */
-#define SG_GET_LOW_DMA 0x227a   /* 0-> use all ram for dma; 1-> low dma ram */
-
-/* When SG_SET_FORCE_PACK_ID set to 1, pack_id is input to read() which
-   tries to fetch a packet with a matching pack_id, waits, or returns EAGAIN.
-   If pack_id is -1 then read oldest waiting. When ...FORCE_PACK_ID set to 0
-   then pack_id ignored by read() and oldest readable fetched. */
-#define SG_SET_FORCE_PACK_ID 0x227b
-#define SG_GET_PACK_ID 0x227c /* Yields oldest readable pack_id (or -1) */
-
-#define SG_GET_NUM_WAITING 0x227d /* Number of commands awaiting read() */
-
-/* Yields max scatter gather tablesize allowed by current host adapter */
-#define SG_GET_SG_TABLESIZE 0x227F  /* 0 implies can't do scatter gather */
-
-#define SG_GET_VERSION_NUM 0x2282 /* Example: version 2.1.34 yields 20134 */
-
-/* Returns -EBUSY if occupied. 3rd argument pointer to int (see next) */
-#define SG_SCSI_RESET 0x2284
-/* Associated values that can be given to SG_SCSI_RESET follow.
- * SG_SCSI_RESET_NO_ESCALATE may be OR-ed to the _DEVICE, _TARGET, _BUS
- * or _HOST reset value so only that action is attempted. */
-#define		SG_SCSI_RESET_NOTHING	0
-#define		SG_SCSI_RESET_DEVICE	1
-#define		SG_SCSI_RESET_BUS	2
-#define		SG_SCSI_RESET_HOST	3
-#define		SG_SCSI_RESET_TARGET	4
-#define		SG_SCSI_RESET_NO_ESCALATE	0x100
-
-/* synchronous SCSI command ioctl, (only in version 3 interface) */
-#define SG_IO 0x2285   /* similar effect as write() followed by read() */
-
-#define SG_GET_REQUEST_TABLE 0x2286   /* yields table of active requests */
-
-/* How to treat EINTR during SG_IO ioctl(), only in SG 3.x series */
-#define SG_SET_KEEP_ORPHAN 0x2287 /* 1 -> hold for read(), 0 -> drop (def) */
-#define SG_GET_KEEP_ORPHAN 0x2288
-
-/* yields scsi midlevel's access_count for this SCSI device */
-#define SG_GET_ACCESS_COUNT 0x2289  
-
-
-#define SG_SCATTER_SZ (8 * 4096)
-/* Largest size (in bytes) a single scatter-gather list element can have.
-   The value used by the driver is 'max(SG_SCATTER_SZ, PAGE_SIZE)'.
-   This value should be a power of 2 (and may be rounded up internally).
-   If scatter-gather is not supported by adapter then this value is the
-   largest data block that can be read/written by a single scsi command. */
-
-#define SG_DEFAULT_RETRIES 0
-
-/* Defaults, commented if they differ from original sg driver */
-#define SG_DEF_FORCE_PACK_ID 0
-#define SG_DEF_KEEP_ORPHAN 0
-#define SG_DEF_RESERVED_SIZE SG_SCATTER_SZ /* load time option */
-
-/* maximum outstanding requests, write() yields EDOM if exceeded */
-#define SG_MAX_QUEUE 16
-
-#define SG_BIG_BUFF SG_DEF_RESERVED_SIZE    /* for backward compatibility */
-
-/* Alternate style type names, "..._t" variants preferred */
-typedef struct sg_io_hdr Sg_io_hdr;
-typedef struct sg_io_vec Sg_io_vec;
-typedef struct sg_scsi_id Sg_scsi_id;
-typedef struct sg_req_info Sg_req_info;
-
-
-/* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
-/*   The older SG interface based on the 'sg_header' structure follows.   */
-/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
-
-#define SG_MAX_SENSE 16   /* this only applies to the sg_header interface */
-
-struct sg_header
-{
-    int pack_len;    /* [o] reply_len (ie useless), ignored as input */
-    int reply_len;   /* [i] max length of expected reply (inc. sg_header) */
-    int pack_id;     /* [io] id number of packet (use ints >= 0) */
-    int result;      /* [o] 0==ok, else (+ve) Unix errno (best ignored) */
-    unsigned int twelve_byte:1;
-	/* [i] Force 12 byte command length for group 6 & 7 commands  */
-    unsigned int target_status:5;   /* [o] scsi status from target */
-    unsigned int host_status:8;     /* [o] host status (see "DID" codes) */
-    unsigned int driver_status:8;   /* [o] driver status+suggestion */
-    unsigned int other_flags:10;    /* unused */
-    unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] Output in 3 cases:
-	   when target_status is CHECK_CONDITION or
-	   when target_status is COMMAND_TERMINATED or
-	   when (driver_status & DRIVER_SENSE) is true. */
-};      /* This structure is 36 bytes long on i386 */
-
-
-/* IOCTLs: The following are not required (or ignored) when the sg_io_hdr_t
-	   interface is used. They are kept for backward compatibility with
-	   the original and version 2 drivers. */
-
-#define SG_SET_TIMEOUT 0x2201  /* unit: jiffies (10ms on i386) */
-#define SG_GET_TIMEOUT 0x2202  /* yield timeout as _return_ value */
-
-/* Get/set command queuing state per fd (default is SG_DEF_COMMAND_Q.
-   Each time a sg_io_hdr_t object is seen on this file descriptor, this
-   command queuing flag is set on (overriding the previous setting). */
-#define SG_GET_COMMAND_Q 0x2270   /* Yields 0 (queuing off) or 1 (on) */
-#define SG_SET_COMMAND_Q 0x2271   /* Change queuing state with 0 or 1 */
-
-/* Turn on/off error sense trace (1 and 0 respectively, default is off).
-   Try using: "# cat /proc/scsi/sg/debug" instead in the v3 driver */
-#define SG_SET_DEBUG 0x227e    /* 0 -> turn off debug */
-
-#define SG_NEXT_CMD_LEN 0x2283  /* override SCSI command length with given
-		   number on the next write() on this file descriptor */
-
-
-/* Defaults, commented if they differ from original sg driver */
 #ifdef __KERNEL__
-#define SG_DEFAULT_TIMEOUT_USER	(60*USER_HZ) /* HZ == 'jiffies in 1 second' */
-#else
-#define SG_DEFAULT_TIMEOUT	(60*HZ)	     /* HZ == 'jiffies in 1 second' */
+#define SG_DEFAULT_TIMEOUT_USER (60*USER_HZ) /* HZ == 'jiffies in 1 second' */
 #endif
 
-#define SG_DEF_COMMAND_Q 0     /* command queuing is always on when
-				  the new interface is used */
-#define SG_DEF_UNDERRUN_FLAG 0
+#undef SG_DEFAULT_TIMEOUT	/* because of conflicting define in sg.c */
 
-#endif
+#endif	/* end of ifndef _SCSI_GENERIC_H guard */
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
new file mode 100644
index 000000000000..bb1be50d7b01
--- /dev/null
+++ b/include/uapi/scsi/sg.h
@@ -0,0 +1,329 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI_SCSI_SG_H
+#define _UAPI_SCSI_SG_H
+
+/*
+ * History:
+ *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com), to allow user
+ *  process control of SCSI devices.
+ *  Development Sponsored by Killy Corp. NY NY
+ *
+ * Original driver (sg.h):
+ *   Copyright (C) 1992 Lawrence Foard
+ *
+ * Later extensions (versions 2, 3 and 4) to driver:
+ *   Copyright (C) 1998 - 2018 Douglas Gilbert
+ *
+ * Version 4.0.11 (20190502)
+ *  This version is for Linux 4 and 5 series kernels.
+ *
+ * Documentation
+ * =============
+ * A web site for the SG device driver can be found at:
+ *   http://sg.danny.cz/sg  [alternatively check the MAINTAINERS file]
+ * The documentation for the sg version 3 driver can be found at:
+ *   http://sg.danny.cz/sg/p/sg_v3_ho.html
+ * Also see: <kernel_source>/Documentation/scsi/scsi-generic.txt
+ *
+ * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html
+ */
+
+#include <linux/types.h>
+#include <linux/major.h>
+
+/* bsg.h contains the sg v4 user space interface structure (sg_io_v4). */
+#include <linux/bsg.h>
+
+/*
+ * Same structure as used by readv() call. It defines one scatter-gather
+ * element. "Scatter-gather" is abbreviated to "sgat" in this driver to
+ * avoid confusion with this driver's name.
+ */
+typedef struct sg_iovec	{
+	void __user *iov_base;	/* Starting address (of a byte) */
+	size_t iov_len;		/* Length in bytes */
+} sg_iovec_t;
+
+
+typedef struct sg_io_hdr {
+	int interface_id;	/* [i] 'S' for SCSI generic (required) */
+	int dxfer_direction;	/* [i] data transfer direction  */
+	unsigned char cmd_len;	/* [i] SCSI command length */
+	unsigned char mx_sb_len;/* [i] max length to write to sbp */
+	unsigned short iovec_count;	/* [i] 0 implies no sgat list */
+	unsigned int dxfer_len;	/* [i] byte count of data transfer */
+	/* dxferp points to data transfer memory or scatter gather list */
+	void __user *dxferp;	/* [i], [*io] */
+	unsigned char __user *cmdp;/* [i], [*i] points to command to perform */
+	void __user *sbp;	/* [i], [*o] points to sense_buffer memory */
+	unsigned int timeout;	/* [i] MAX_UINT->no timeout (unit: millisec) */
+	unsigned int flags;	/* [i] 0 -> default, see SG_FLAG... */
+	int pack_id;		/* [i->o] unused internally (normally) */
+	void __user *usr_ptr;	/* [i->o] unused internally */
+	unsigned char status;	/* [o] scsi status */
+	unsigned char masked_status;/* [o] shifted, masked scsi status */
+	unsigned char msg_status;/* [o] messaging level data (optional) */
+	unsigned char sb_len_wr; /* [o] byte count actually written to sbp */
+	unsigned short host_status; /* [o] errors from host adapter */
+	unsigned short driver_status;/* [o] errors from software driver */
+	int resid;		/* [o] dxfer_len - actual_transferred */
+	/* unit may be nanoseconds after SG_SET_GET_EXTENDED ioctl use */
+	unsigned int duration;	/* [o] time taken by cmd (unit: millisec) */
+	unsigned int info;	/* [o] auxiliary information */
+} sg_io_hdr_t;
+
+#define SG_INTERFACE_ID_ORIG 'S'
+
+/* Use negative values to flag difference from original sg_header structure */
+#define SG_DXFER_NONE (-1)	/* e.g. a SCSI Test Unit Ready command */
+#define SG_DXFER_TO_DEV (-2)	/* data-out buffer e.g. SCSI WRITE command */
+#define SG_DXFER_FROM_DEV (-3)	/* data-in buffer e.g. SCSI READ command */
+/*
+ * SG_DXFER_TO_FROM_DEV is treated like SG_DXFER_FROM_DEV with the additional
+ * property than during indirect IO the user buffer is copied into the kernel
+ * buffers _before_ the transfer from the device takes place. Useful if short
+ * DMA transfers (less than requested) are not reported (e.g. resid always 0).
+ */
+#define SG_DXFER_TO_FROM_DEV (-4)
+#define SG_DXFER_UNKNOWN (-5)	/* Unknown data direction, do not use */
+
+/* following flag values can be OR-ed together in v3::flags or v4::flags */
+#define SG_FLAG_DIRECT_IO 1	/* default is indirect IO */
+/* SG_FLAG_UNUSED_LUN_INHIBIT is ignored in sg v4 driver */
+#define SG_FLAG_UNUSED_LUN_INHIBIT 2  /* ignored, was LUN overwrite in cdb */
+#define SG_FLAG_MMAP_IO 4	/* request memory mapped IO */
+/* no transfer of kernel buffers to/from user space; used for sharing */
+#define SG_FLAG_NO_DXFER 0x10000
+/* defaults: for sg driver (v3_v4): Q_AT_HEAD; for block layer: Q_AT_TAIL */
+#define SG_FLAG_Q_AT_TAIL 0x10
+#define SG_FLAG_Q_AT_HEAD 0x20
+
+/* Output (potentially OR-ed together) in v3::info or v4::info field */
+#define SG_INFO_OK_MASK 0x1
+#define SG_INFO_OK 0x0		/* no sense, host nor driver "noise" */
+#define SG_INFO_CHECK 0x1	/* something abnormal happened */
+
+#define SG_INFO_DIRECT_IO_MASK 0x6
+#define SG_INFO_INDIRECT_IO 0x0	/* data xfer via kernel buffers (or no xfer) */
+#define SG_INFO_DIRECT_IO 0x2	/* direct IO requested and performed */
+#define SG_INFO_MIXED_IO 0x4	/* not used, always 0 */
+#define SG_INFO_DEVICE_DETACHING 0x8	/* completed successfully but ... */
+#define SG_INFO_ABORTED 0x10	/* this command has been aborted */
+#define SG_INFO_MRQ_FINI 0x20	/* marks multi-reqs that have finished */
+
+/*
+ * Pointer to object of this structure filled by ioctl(SG_GET_SCSI_ID). Last
+ * field changed in v4 driver, was 'int unused[2]' so remains the same size.
+ */
+typedef struct sg_scsi_id {
+	int host_no;	/* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */
+	int channel;
+	int scsi_id;	/* scsi id of target device */
+	int lun;	/* lower 32 bits of internal 64 bit integer */
+	int scsi_type;	/* TYPE_... defined in scsi/scsi.h */
+	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
+	short d_queue_depth;/* device (or adapter) maximum queue length */
+	int unused[2];
+} sg_scsi_id_t;
+
+/* For backward compatibility v4 driver yields at most SG_MAX_QUEUE of these */
+typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
+	char req_state;	/* See 'enum sg_rq_state' definition in v4 driver */
+	char orphan;	/* 0 -> normal request, 1 -> from interrupted SG_IO */
+	/* sg_io_owned set imples synchronous, clear implies asynchronous */
+	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
+	char problem;	/* 0 -> no problem detected, 1 -> error to report */
+	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
+	int pack_id;	/* pack_id, in v4 driver may be tag instead */
+	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
+	unsigned int duration;
+	int unused;
+} sg_req_info_t;
+
+/*
+ * IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
+ * [Those that only apply to the SG 2.x drivers are at the end of the file.]
+ * (_GET_s yield result via 'int *' 3rd argument unless otherwise indicated)
+ */
+
+#define SG_EMULATED_HOST 0x2203	/* true for emulated host adapter (ATAPI) */
+
+/*
+ * Used to configure SCSI command transformation layer for ATAPI devices.
+ * Only supported by the ide-scsi driver. 20181014 No longer supported, this
+ * driver passes them to the mid-level which returns a EINVAL (22) errno.
+ *
+ * Original note: N.B. 3rd arg is not pointer but value: 3rd arg = 0 to
+ * disable transform, 1 to enable it
+ */
+#define SG_SET_TRANSFORM 0x2204
+#define SG_GET_TRANSFORM 0x2205
+
+#define SG_SET_RESERVED_SIZE 0x2275  /* request new reserved buffer size */
+#define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
+
+/* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
+#define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
+/* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
+
+/* Override host setting and always DMA using low memory ( <16MB on i386) */
+#define SG_SET_FORCE_LOW_DMA 0x2279  /* 0-> use adapter setting, 1-> force */
+#define SG_GET_LOW_DMA 0x227a	/* 0-> use all ram for dma; 1-> low dma ram */
+
+/*
+ * When SG_SET_FORCE_PACK_ID set to 1, pack_id (or tag) is input to read() or
+ * ioctl(SG_IO_RECEIVE). These functions wait until matching packet (request/
+ * command) is finished but they will return with EAGAIN quickly if the file
+ * descriptor was opened O_NONBLOCK or (in v4) if SGV4_FLAG_IMMED is given.
+ * The tag is used when SG_CTL_FLAGM_TAG_FOR_PACK_ID is set on the parent
+ * file descriptor (default: use pack_id). If pack_id or tag is -1 then read
+ * oldest waiting and this is the same action as when FORCE_PACK_ID is
+ * clear on the parent file descriptor. In the v4 interface the pack_id is
+ * placed the in sg_io_v4::request_extra field .
+ */
+#define SG_SET_FORCE_PACK_ID 0x227b	/* pack_id or in v4 can be tag */
+#define SG_GET_PACK_ID 0x227c  /* Yields oldest readable pack_id/tag, or -1 */
+
+#define SG_GET_NUM_WAITING 0x227d /* Number of commands awaiting read() */
+
+/* Yields max scatter gather tablesize allowed by current host adapter */
+#define SG_GET_SG_TABLESIZE 0x227F  /* 0 implies can't do scatter gather */
+
+/*
+ * Integer form of version number: [x]xyyzz where [x] empty when x=0 .
+ * String form of version number: "[x]x.[y]y.zz"
+ */
+#define SG_GET_VERSION_NUM 0x2282 /* Example: version "2.1.34" yields 20134 */
+
+/* Returns -EBUSY if occupied. 3rd argument pointer to int (see next) */
+#define SG_SCSI_RESET 0x2284
+/*
+ * Associated values that can be given to SG_SCSI_RESET follow.
+ * SG_SCSI_RESET_NO_ESCALATE may be OR-ed to the _DEVICE, _TARGET, _BUS
+ * or _HOST reset value so only that action is attempted.
+ */
+#define		SG_SCSI_RESET_NOTHING	0
+#define		SG_SCSI_RESET_DEVICE	1
+#define		SG_SCSI_RESET_BUS	2
+#define		SG_SCSI_RESET_HOST	3
+#define		SG_SCSI_RESET_TARGET	4
+#define		SG_SCSI_RESET_NO_ESCALATE	0x100
+
+/* synchronous SCSI command ioctl, (for version 3 and 4 interface) */
+#define SG_IO 0x2285	/* similar effect as write() followed by read() */
+
+#define SG_GET_REQUEST_TABLE 0x2286	/* yields table of active requests */
+
+/* How to treat EINTR during SG_IO ioctl(), only in sg v3 and v4 driver */
+#define SG_SET_KEEP_ORPHAN 0x2287 /* 1 -> hold for read(), 0 -> drop (def) */
+#define SG_GET_KEEP_ORPHAN 0x2288
+
+/*
+ * Yields scsi midlevel's access_count for this SCSI device. 20181014 No
+ * longer available, always yields 1.
+ */
+#define SG_GET_ACCESS_COUNT 0x2289
+
+
+/*
+ * Default size (in bytes) a single scatter-gather list element can have.
+ * The value used by the driver is 'max(SG_SCATTER_SZ, PAGE_SIZE)'. This
+ * value should be a power of 2 (and may be rounded up internally). In the
+ * v4 driver this can be changed by ioctl(SG_SET_GET_EXTENDED{SGAT_ELEM_SZ}).
+ */
+#define SG_SCATTER_SZ (8 * 4096)
+
+/* sg driver users' code should handle retries (e.g. from Unit Attentions) */
+#define SG_DEFAULT_RETRIES 0
+
+/* Defaults, commented if they differ from original sg driver */
+#define SG_DEF_FORCE_PACK_ID 0
+#define SG_DEF_KEEP_ORPHAN 0
+#define SG_DEF_RESERVED_SIZE SG_SCATTER_SZ /* load time option */
+
+/*
+ * Maximum outstanding requests (i.e write()s without corresponding read()s)
+ * yields EDOM from write() if exceeded. This limit only applies prior to
+ * version 3.9 . It is still used as a maximum number of sg_req_info objects
+ * that are returned from the SG_GET_REQUEST_TABLE ioctl.
+ */
+#define SG_MAX_QUEUE 16
+
+#define SG_BIG_BUFF SG_DEF_RESERVED_SIZE    /* for backward compatibility */
+
+/*
+ * Alternate style type names, "..._t" variants (as found in the
+ * 'typedef struct * {};' definitions above) are preferred to these:
+ */
+typedef struct sg_io_hdr Sg_io_hdr;
+typedef struct sg_io_vec Sg_io_vec;
+typedef struct sg_scsi_id Sg_scsi_id;
+typedef struct sg_req_info Sg_req_info;
+
+
+/* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
+/*   The v1+v2 SG interface based on the 'sg_header' structure follows.   */
+/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
+
+#define SG_MAX_SENSE 16	/* this only applies to the sg_header interface */
+
+struct sg_header {
+	int pack_len;	/* [o] reply_len (ie useless), ignored as input */
+	int reply_len;	/* [i] max length of expected reply (inc. sg_header) */
+	int pack_id;	/* [io] id number of packet (use ints >= 0) */
+	int result;	/* [o] 0==ok, else (+ve) Unix errno (best ignored) */
+	unsigned int twelve_byte:1;
+	    /* [i] Force 12 byte command length for group 6 & 7 commands  */
+	unsigned int target_status:5;	/* [o] scsi status from target */
+	unsigned int host_status:8;	/* [o] host status (see "DID" codes) */
+	unsigned int driver_status:8;	/* [o] driver status+suggestion */
+	unsigned int other_flags:10;	/* unused */
+	unsigned char sense_buffer[SG_MAX_SENSE];
+	/*
+	 * [o] Output in 3 cases:
+	 *	when target_status is CHECK_CONDITION or
+	 *	when target_status is COMMAND_TERMINATED or
+	 *	when (driver_status & DRIVER_SENSE) is true.
+	 */
+};
+
+/*
+ * IOCTLs: The following are not required (or ignored) when the v3 or v4
+ * interface is used as those structures contain a timeout field. These
+ * ioctls are kept for backward compatibility with v1+v2 interfaces.
+ */
+
+#define SG_SET_TIMEOUT 0x2201  /* unit: (user space) jiffies */
+#define SG_GET_TIMEOUT 0x2202  /* yield timeout as _return_ value */
+
+/*
+ * Get/set command queuing state per fd (default is SG_DEF_COMMAND_Q.
+ * Each time a sg_io_hdr_t object is seen on this file descriptor, this
+ * command queuing flag is set on (overriding the previous setting).
+ * This setting defaults to 0 (i.e. no queuing) but gets set the first
+ * time that fd sees a v3 or v4 interface request.
+ */
+#define SG_GET_COMMAND_Q 0x2270   /* Yields 0 (queuing off) or 1 (on) */
+#define SG_SET_COMMAND_Q 0x2271   /* Change queuing state with 0 or 1 */
+
+/*
+ * Turn on/off error sense trace (1 and 0 respectively, default is off).
+ * Try using: "# cat /proc/scsi/sg/debug" instead in the v3 driver
+ */
+#define SG_SET_DEBUG 0x227e    /* 0 -> turn off debug */
+
+/*
+ * override SCSI command length with given number on the next write() on
+ * this file descriptor (v1 and v2 interface only)
+ */
+#define SG_NEXT_CMD_LEN 0x2283
+
+/* command queuing is always on when the v3 or v4 interface is used */
+#define SG_DEF_COMMAND_Q 0
+
+#define SG_DEF_UNDERRUN_FLAG 0
+
+/* If the timeout value in the v3_v4 interfaces is 0, this value is used */
+#define SG_DEFAULT_TIMEOUT	(60*HZ)	/* HZ == 'jiffies in 1 second' */
+
+#endif		/* end of _UAPI_SCSI_SG_H guard */

From patchwork Fri May 24 18:47:55 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960341
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A3471390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:42 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D85428A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:42 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0D03D28A25; Fri, 24 May 2019 18:48:42 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BFDC28A24
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403760AbfEXSse (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:34 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56364 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391503AbfEXSse (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:34 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 60B0920419B;
        Fri, 24 May 2019 20:48:28 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id rKtciVTT4TkV; Fri, 24 May 2019 20:48:22 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 1F5CB204190;
        Fri, 24 May 2019 20:48:16 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 05/19] sg: replace rq array with lists
Date: Fri, 24 May 2019 14:47:55 -0400
Message-Id: <20190524184809.25121-6-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Remove the fixed size array of 16 request elements per file descriptor
and replace with two linked lists (per fd). One list is for active
commands, the other list is a free list. sg_request objects are now
kept, available for re-use, until their owning file descriptor is
closed. The associated block request and scsi_request objects are
released much earlier; their lifetime is the same as it was in the
v3 sg driver. The lifetime of the bio is also the same (but is
stretched in a later patch).

Add an enum for request state (sg_rq_state) and collect various flags
into bit maps: one for requests (SG_FRQ_*) and the other for file
descriptors (SG_FFD_*).

Since the above changes touch almost every function and low level
structures, this patch is big. With so many changes, the diff
utility that generates the patch sometimes loses track.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 drivers/scsi/sg.c | 3451 +++++++++++++++++++++++++++++----------------
 1 file changed, 2231 insertions(+), 1220 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 291c278451ef..72ce51b3198c 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -57,28 +57,79 @@ static char *sg_version_date = "20140603";
 
 #define SG_MAX_DEVS 32768
 
-/* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
+/* Comment out the following line to compile out SCSI_LOGGING stuff */
+#define SG_DEBUG 1
+
+#if !IS_ENABLED(SG_DEBUG)
+#if IS_ENABLED(DEBUG)    /* If SG_DEBUG not defined, check for DEBUG */
+#define SG_DEBUG DEBUG
+#endif
+#endif
+
+/*
+ * SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
  * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
- * than 16 bytes are "variable length" whose length is a multiple of 4
+ * than 16 bytes are "variable length" whose length is a multiple of 4, so
  */
 #define SG_MAX_CDB_SIZE 252
 
+#define uptr64(val) ((void __user *)(uintptr_t)(val))
+#define cuptr64(val) ((const void __user *)(uintptr_t)(val))
+
+/* Following enum contains the states of sg_request::rq_st */
+enum sg_rq_state {
+	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
+	SG_RS_INFLIGHT,		/* active: cmd/req issued, no response yet */
+	SG_RS_AWAIT_RD,		/* response received, awaiting read */
+	SG_RS_DONE_RD,		/* read is ongoing or done */
+	SG_RS_BUSY,		/* temporary state should rarely be seen */
+};
+
+#define SG_TIME_UNIT_MS 0	/* milliseconds */
+#define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
+#define SG_FD_Q_AT_HEAD 0
+#define SG_DEFAULT_Q_AT SG_FD_Q_AT_HEAD /* for backward compatibility */
+#define SG_FL_MMAP_DIRECT (SG_FLAG_MMAP_IO | SG_FLAG_DIRECT_IO)
+
+/* Only take lower 4 bits of driver byte, all host byte and sense byte */
+#define SG_ML_RESULT_MSK 0x0fff00ff	/* mid-level's 32 bit result value */
+
+#define SG_SHARE_FD_UNUSED (-1)
+#define SG_SHARE_FD_MASTER (-2)		/* values >= 0 imply FD_SLAVE */
+
+#define SG_PACK_ID_WILDCARD (-1)
+
+#define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
+
+/* Bit positions (flags) for sg_request::frq_bm bitmask follow */
+#define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
+#define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
+#define SG_FRQ_DIO_IN_USE	3	/* false->indirect_IO,mmap; 1->dio */
+#define SG_FRQ_NO_US_XFER	4	/* no user space transfer of data */
+#define SG_FRQ_DEACT_ORPHAN	7	/* not keeping orphan so de-activate */
+#define SG_FRQ_BLK_PUT_REQ	9	/* set when blk_put_request() called */
+
+/* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
+#define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
+#define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
+#define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
+#define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
+#define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+
 
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
-/* N.B. This variable is readable and writeable via
-   /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer
-   of this size (or less if there is not enough memory) will be reserved
-   for use by this file descriptor. [Deprecated usage: this variable is also
-   readable via /proc/sys/kernel/sg-big-buff if the sg driver is built into
-   the kernel (i.e. it is not a module).] */
-static int def_reserved_size = -1;	/* picks up init parameter */
+/*
+ * This variable is accessible via /proc/scsi/sg/def_reserved_size . Each
+ * time sg_open() is called a sg_request of this size (or less if there is
+ * not enough memory) will be reserved for use by this file descriptor.
+ */
+static int def_reserved_size = -1;      /* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
-static int scatter_elem_sz_prev = SG_SCATTER_SZ;
 
-#define SG_SECTOR_SZ 512
+#define SG_DEF_SECTOR_SZ 512
 
 static int sg_add_device(struct device *, struct class_interface *);
 static void sg_remove_device(struct device *, struct class_interface *);
@@ -91,106 +142,155 @@ static struct class_interface sg_interface = {
 	.remove_dev     = sg_remove_device,
 };
 
-struct sg_scatter_hold { /* holding area for scsi scatter gather info */
-	u16 k_use_sg; /* Count of kernel scatter-gather pieces */
-	unsigned int sglist_len; /* size of malloc'd scatter-gather list ++ */
-	unsigned int bufflen;	/* Size of (aggregate) data buffer */
-	struct page **pages;
-	int page_order;
-	char dio_in_use;	/* 0->indirect IO (or mmap), 1->dio */
-	u8 cmd_opcode;		/* first byte of command */
+/* Subset of sg_io_hdr found in <scsi/sg.h>, has only [i] and [i->o] fields */
+struct sg_slice_hdr3 {
+	int interface_id;
+	int dxfer_direction;
+	u8 cmd_len;
+	u8 mx_sb_len;
+	u16 iovec_count;
+	unsigned int dxfer_len;
+	void __user *dxferp;
+	u8 __user *cmdp;
+	void __user *sbp;
+	unsigned int timeout;
+	unsigned int flags;
+	int pack_id;
+	void __user *usr_ptr;
+};
+
+struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
+	struct page **pages;	/* num_sgat element array of struct page* */
+	int buflen;		/* capacity in bytes (dlen<=buflen) */
+	int dlen;		/* current valid data length of this req */
+	u16 page_order;		/* byte_len = (page_size*(2**page_order)) */
+	u16 num_sgat;		/* actual number of scatter-gather segments */
 };
 
 struct sg_device;		/* forward declarations */
 struct sg_fd;
 
-struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
-	struct list_head entry;	/* list entry */
-	struct sg_fd *parentfp;	/* NULL -> not in use */
-	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
-	sg_io_hdr_t header;	/* scsi command+info, see <scsi/sg.h> */
-	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
-	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
-	char orphan;		/* 1 -> drop on sight, 0 -> normal */
-	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
-	/* done protected by rq_list_lock */
-	char done;		/* 0->before bh, 1->before read, 2->read */
-	struct request *rq;
-	struct bio *bio;
-	struct execute_work ew;
+struct sg_request {	/* active SCSI command or inactive on free list (fl) */
+	struct list_head rq_entry;	/* member of rq_list (active cmd) */
+	struct list_head fl_entry;	/* member of rq_fl */
+	spinlock_t req_lck;
+	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
+	struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
+	u32 duration;		/* cmd duration in milliseconds */
+	u32 rq_flags;		/* hold user supplied flags */
+	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
+	u32 rq_result;		/* packed scsi request result from LLD */
+	int in_resid;		/* requested-actual byte count on data-in */
+	int pack_id;		/* user provided packet identifier field */
+	int sense_len;		/* actual sense buffer length (data-in) */
+	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
+	u8 cmd_opcode;		/* first byte of SCSI cdb */
+	u64 start_ns;		/* starting point of command duration calc */
+	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	u8 *sense_bp;		/* alloc-ed sense buffer, as needed */
+	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
+	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
+	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
+	struct execute_work ew_orph;	/* harvest orphan request */
 };
 
 struct sg_fd {		/* holds the state of a file descriptor */
-	struct list_head sfd_siblings;  /* protected by device's sfd_lock */
+	struct list_head sfd_entry;	/* member sg_device::sfds list */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
-	rwlock_t rq_list_lock;	/* protect access to list in req_arr */
-	struct mutex f_mutex;	/* protect against changes in this fd */
+	struct mutex f_mutex;	/* serialize ioctls on this fd */
+	spinlock_t rq_list_lock;/* for rw-lock on sg_request lists [rcu] */
+	struct list_head rq_list; /* head of inflight sg_request list */
+	struct list_head rq_fl; /* head of sg_request free list */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
-	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
-	struct list_head rq_list; /* head of request list */
-	struct fasync_struct *async_qp;	/* used by asynchronous notification */
-	struct sg_request req_arr[SG_MAX_QUEUE];/* use as singly-linked list */
-	char force_packid;	/* 1 -> pack_id input to read(), 0 -> ignored */
-	char cmd_q;		/* 1 -> allow command queuing, 0 -> don't */
+	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
+	atomic_t submitted;	/* number inflight or awaiting read */
+	atomic_t waiting;	/* number of requests awaiting read */
+	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
+	pid_t tid;		/* thread id when opened */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
-	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
-	char mmap_called;	/* 0 -> mmap() never called on this fd */
-	char res_in_use;	/* 1 -> 'reserve' array in use */
+	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct kref f_ref;
-	struct execute_work ew;
+	struct execute_work ew_fd;  /* harvest all fd resources and lists */
 };
 
-struct sg_device { /* holds the state of each scsi generic device */
+struct sg_device {	/* holds the state of each scsi generic device */
 	struct scsi_device *device;
-	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
-	struct mutex open_rel_lock;     /* held when in open() or release() */
-	int sg_tablesize;	/* adapter's max scatter-gather table size */
+	wait_queue_head_t open_wait;	/* queue open() when O_EXCL present */
+	struct mutex open_rel_lock;	/* held when in open() or release() */
+	int max_sgat_elems;	/* adapter's max sgat number of elements */
+	int max_sgat_sz;	/* max number of bytes in sgat list */
 	u32 index;		/* device index number */
-	struct list_head sfds;
-	rwlock_t sfd_lock;      /* protect access to sfd list */
-	atomic_t detaching;     /* 0->device usable, 1->device detaching */
+	struct list_head sfds;	/* head of sg_fd::sfd_entry list */
+	rwlock_t sfd_llock;	/* protect access to sfds list */
+	atomic_t detaching;	/* 0->device usable, 1->device detaching */
 	bool exclude;		/* 1->open(O_EXCL) succeeded and is active */
+	u8 sgdebug;	/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	int open_cnt;		/* count of opens (perhaps < num(sfds) ) */
-	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
 	struct kref d_ref;
 };
 
+struct sg_comm_wr_t {	/* arguments to sg_common_write() */
+	int timeout;
+	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	struct sg_io_hdr *h3p;
+	u8 *cmnd;
+};
+
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(struct sg_request *srp, u8 *cmd);
-static int sg_finish_rem_req(struct sg_request *srp);
-static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
-			     int buff_size);
-static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
-			const char __user *buf, size_t count, int blocking,
-			int read_only, int sg_io_owned,
-			struct sg_request **o_srp);
-static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
-			   u8 *cmnd, int timeout, int blocking);
-static int sg_rd_append(struct sg_request *srp, char __user *outp,
+static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
+			int dxfer_dir);
+// static int sg_finish_rem_req(struct sg_request *srp);
+static void sg_finish_scsi_blk_rq(struct sg_request *srp);
+static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
+static int sg_submit(struct file *filp, struct sg_fd *sfp,
+		     struct sg_io_hdr *hp, bool sync,
+		     struct sg_request **o_srp);
+static struct sg_request *sg_common_write(struct sg_fd *sfp,
+					  struct sg_comm_wr_t *cwp);
+static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
-static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
-static void sg_build_reserve(struct sg_fd *sfp, int req_size);
-static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
-			    int size);
-static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
+static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_add_request(struct sg_fd *sfp);
-static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int pack_id);
+static struct sg_request *sg_add_request(struct sg_fd *sfp, int dxfr_len,
+					 struct sg_comm_wr_t *cwrp);
+static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
+static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
+					 int db_len);
+static void sg_calc_sgat_param(struct sg_device *sdp);
+static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
+static void sg_rep_rq_state_fail(struct sg_device *sdp,
+				 enum sg_rq_state exp_old_st,
+				 enum sg_rq_state want_st,
+				 enum sg_rq_state act_old_st);
+
+#define SZ_SG_HEADER ((int)sizeof(struct sg_header))    /* v1 and v2 header */
+#define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
+#define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
+
+#define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
+#define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == SG_RS_AWAIT_RD)
+
+/*
+ * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
+ * 'depth' is a number between 1 (most severe) and 7 (most noisy, most
+ * information). All messages are logged as informational (KERN_INFO). In
+ * the unexpected situation where sdp is NULL the macro reverts to a pr_info
+ * and ignores CONFIG_SCSI_LOGGING and always prints to the log.
+ */
 
-#define SZ_SG_HEADER sizeof(struct sg_header)
-#define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
-#define SZ_SG_IOVEC sizeof(sg_iovec_t)
-#define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
+#define SG_PROC_DEBUG_SZ 8192
 
 #if IS_ENABLED(CONFIG_SCSI_LOGGING)
 #define SG_LOG(depth, sdp, fmt, a...)					\
@@ -280,6 +380,25 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 	return retval;
 }
 
+/*
+ * scsi_block_when_processing_errors() returns 0 when dev was taken offline by
+ * error recovery, 1 otherwise (i.e. okay). Even if in error recovery, let
+ * user continue if O_NONBLOCK set. Permits SCSI commands to be issued during
+ * error recovery. Tread carefully.
+ * Returns 0 for ok (i.e. allow), -EPROTO if sdp is NULL, otherwise -ENXIO .
+ */
+static inline int
+sg_allow_if_err_recovery(struct sg_device *sdp, bool non_block)
+{
+	if (!sdp)
+		return -EPROTO;
+	if (non_block)
+		return 0;
+	if (likely(scsi_block_when_processing_errors(sdp->device)))
+		return 0;
+	return -ENXIO;
+}
+
 /*
  * Corresponds to the open() system call on sg devices. Implements O_EXCL on
  * a per device basis using 'open_cnt'. If O_EXCL and O_NONBLOCK and there is
@@ -290,16 +409,16 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 static int
 sg_open(struct inode *inode, struct file *filp)
 {
-	bool o_excl;
+	bool o_excl, non_block;
 	int min_dev = iminor(inode);
 	int op_flags = filp->f_flags;
-	struct request_queue *q;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	int retval;
 
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
+	non_block = !!(op_flags & O_NONBLOCK);
 	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM; /* Can't lock it with read only access */
 	sdp = sg_get_dev(min_dev);
@@ -320,15 +439,9 @@ sg_open(struct inode *inode, struct file *filp)
 	if (retval)
 		goto sdp_put;
 
-	/* scsi_block_when_processing_errors() may block so bypass
-	 * check if O_NONBLOCK. Permits SCSI commands to be issued
-	 * during error recovery. Tread carefully. */
-	if (!((op_flags & O_NONBLOCK) ||
-	      scsi_block_when_processing_errors(sdp->device))) {
-		retval = -ENXIO;
-		/* we are in error recovery for this device */
+	retval = sg_allow_if_err_recovery(sdp, non_block);
+	if (retval)
 		goto error_out;
-	}
 
 	mutex_lock(&sdp->open_rel_lock);
 	if (op_flags & O_NONBLOCK) {
@@ -355,8 +468,8 @@ sg_open(struct inode *inode, struct file *filp)
 
 	if (sdp->open_cnt < 1) {  /* no existing opens */
 		sdp->sgdebug = 0;
-		q = sdp->device->request_queue;
-		sdp->sg_tablesize = queue_max_segments(q);
+		/* Next done in sg_alloc(), repeat here to pick up change? */
+		sg_calc_sgat_param(sdp);
 	}
 	sfp = sg_add_sfp(sdp);
 	if (IS_ERR(sfp)) {
@@ -425,255 +538,278 @@ sg_release(struct inode *inode, struct file *filp)
  * the SCSI device by using write(2), ioctl(SG_IOSUBMIT) or the first half
  * of the synchronous ioctl(SG_IO) system call.
  */
+
 static ssize_t
-sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
+sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
-	int mxsize, cmd_size, k;
-	int input_size, blocking;
+	int mxsize, cmd_size, input_size, res;
 	u8 opcode;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
-	struct sg_header old_hdr;
-	sg_io_hdr_t *hp;
 	u8 cmnd[SG_MAX_CDB_SIZE];
-	int retval;
-
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
+	struct sg_header ov2hdr;
+	struct sg_io_hdr v3hdr;
+	struct sg_header *ohp = &ov2hdr;
+	struct sg_io_hdr *h3p = &v3hdr;
+	struct sg_comm_wr_t cwr;
+
+	res = sg_check_file_access(filp, __func__);
+	if (res)
+		return res;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sdp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
-	if (!sdp)
-		return -ENXIO;
-	if (atomic_read(&sdp->detaching))
+	res = sg_allow_if_err_recovery(sdp, !!(filp->f_flags & O_NONBLOCK));
+	if (res)
+		return res;
+	if (unlikely(atomic_read(&sdp->detaching)))
 		return -ENODEV;
-	if (!((filp->f_flags & O_NONBLOCK) ||
-	      scsi_block_when_processing_errors(sdp->device)))
-		return -ENXIO;
 
-	if (!access_ok(buf, count))
-		return -EFAULT;	/* protects following copy_from_user()s + get_user()s */
 	if (count < SZ_SG_HEADER)
 		return -EIO;
-	if (__copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
+	if (copy_from_user(ohp, p, SZ_SG_HEADER))
 		return -EFAULT;
-	blocking = !(filp->f_flags & O_NONBLOCK);
-	if (old_hdr.reply_len < 0)
-		return sg_new_write(sfp, filp, buf, count,
-				    blocking, 0, 0, NULL);
-	if (count < (SZ_SG_HEADER + 6))
-		return -EIO;	/* The minimum scsi command length is 6 bytes. */
+	if (ohp->reply_len < 0) {	/* assume this is v3 */
+		struct sg_io_hdr *reinter_2p = (struct sg_io_hdr *)ohp;
+		u8 *h3u8p = (u8 *)&v3hdr;
 
-	if (!(srp = sg_add_request(sfp))) {
-		SG_LOG(1, sdp, "%s: queue full\n", __func__);
-		return -EDOM;
+		if (count < SZ_SG_IO_HDR)
+			return -EIO;
+		if (reinter_2p->interface_id != 'S') {
+			pr_info_once("sg: %s: v3 interface only here\n",
+				     __func__);
+			return -EPERM;
+		}
+		memcpy(h3p, ohp, SZ_SG_HEADER);
+		if (__copy_from_user(h3u8p + SZ_SG_HEADER, p + SZ_SG_HEADER,
+				     SZ_SG_IO_HDR - SZ_SG_HEADER))
+			return -EFAULT;
+		res = sg_submit(filp, sfp, h3p, false, NULL);
+		return res < 0 ? res : (int)count;
 	}
-	buf += SZ_SG_HEADER;
-	__get_user(opcode, buf);
+	/* v1 and v2 interfaces processed below this point */
+	if (count < (SZ_SG_HEADER + 6))
+		return -EIO;    /* minimum scsi command length is 6 bytes */
+	p += SZ_SG_HEADER;
+	__get_user(opcode, p);
 	mutex_lock(&sfp->f_mutex);
 	if (sfp->next_cmd_len > 0) {
 		cmd_size = sfp->next_cmd_len;
-		sfp->next_cmd_len = 0;	/* reset so only this write() effected */
+		sfp->next_cmd_len = 0;  /* reset, only this write() effected */
 	} else {
-		cmd_size = COMMAND_SIZE(opcode);	/* based on SCSI command group */
-		if ((opcode >= 0xc0) && old_hdr.twelve_byte)
+		cmd_size = COMMAND_SIZE(opcode);/* old; SCSI command group */
+		if (opcode >= 0xc0 && ohp->twelve_byte)
 			cmd_size = 12;
 	}
 	mutex_unlock(&sfp->f_mutex);
 	SG_LOG(4, sdp, "%s:   scsi opcode=0x%02x, cmd_size=%d\n", __func__,
 	       (unsigned int)opcode, cmd_size);
 	input_size = count - cmd_size;
-	mxsize = max_t(int, input_size, old_hdr.reply_len);
+	mxsize = max_t(int, input_size, ohp->reply_len);
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
-	if (input_size < 0) {
-		sg_remove_request(sfp, srp);
-		return -EIO;	/* User did not pass enough bytes for this command. */
-	}
-	hp = &srp->header;
-	hp->interface_id = '\0';	/* indicator of old interface tunnelled */
-	hp->cmd_len = (u8) cmd_size;
-	hp->iovec_count = 0;
-	hp->mx_sb_len = 0;
+	if (input_size < 0)
+		return -EIO; /* Insufficient bytes passed for this command. */
+	memset(h3p, 0, sizeof(*h3p));
+	h3p->interface_id = '\0';/* indicate v1 or v2 interface (tunnelled) */
+	h3p->cmd_len = (u8)cmd_size;
+	h3p->iovec_count = 0;
+	h3p->mx_sb_len = 0;
 	if (input_size > 0)
-		hp->dxfer_direction = (old_hdr.reply_len > SZ_SG_HEADER) ?
+		h3p->dxfer_direction = (ohp->reply_len > SZ_SG_HEADER) ?
 		    SG_DXFER_TO_FROM_DEV : SG_DXFER_TO_DEV;
 	else
-		hp->dxfer_direction = (mxsize > 0) ? SG_DXFER_FROM_DEV : SG_DXFER_NONE;
-	hp->dxfer_len = mxsize;
-	if ((hp->dxfer_direction == SG_DXFER_TO_DEV) ||
-	    (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV))
-		hp->dxferp = (char __user *)buf + cmd_size;
+		h3p->dxfer_direction = (mxsize > 0) ? SG_DXFER_FROM_DEV :
+						      SG_DXFER_NONE;
+	h3p->dxfer_len = mxsize;
+	if (h3p->dxfer_direction == SG_DXFER_TO_DEV ||
+	    h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)
+		h3p->dxferp = (u8 __user *)p + cmd_size;
 	else
-		hp->dxferp = NULL;
-	hp->sbp = NULL;
-	hp->timeout = old_hdr.reply_len;	/* structure abuse ... */
-	hp->flags = input_size;	/* structure abuse ... */
-	hp->pack_id = old_hdr.pack_id;
-	hp->usr_ptr = NULL;
-	if (__copy_from_user(cmnd, buf, cmd_size))
+		h3p->dxferp = NULL;
+	h3p->sbp = NULL;
+	h3p->timeout = ohp->reply_len;   /* structure abuse ... */
+	h3p->flags = input_size;         /* structure abuse ... */
+	h3p->pack_id = ohp->pack_id;
+	h3p->usr_ptr = NULL;
+	cmnd[0] = opcode;
+	if (__copy_from_user(cmnd + 1, p + 1, cmd_size - 1))
 		return -EFAULT;
 	/*
 	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
-	 * but is is possible that the app intended SG_DXFER_TO_DEV, because there
-	 * is a non-zero input_size, so emit a warning.
+	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
+	 * there is a non-zero input_size, so emit a warning.
 	 */
-	if (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
+	if (h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
 		printk_ratelimited(KERN_WARNING
-				   "sg_write: data in/out %d/%d bytes "
-				   "for SCSI command 0x%x-- guessing "
-				   "data in;\n   program %s not setting "
-				   "count and/or reply_len properly\n",
-				   old_hdr.reply_len - (int)SZ_SG_HEADER,
-				   input_size, (unsigned int) cmnd[0],
-				   current->comm);
-	}
-	k = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
-	return (k < 0) ? k : count;
+			"%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
+			"   program %s not setting count and/or reply_len properly\n",
+			__func__, ohp->reply_len - (int)SZ_SG_HEADER,
+			input_size, (unsigned int)cmnd[0], current->comm);
+	}
+	cwr.frq_bm[0] = 0;	/* initial state clear for all req flags */
+	cwr.h3p = h3p;
+	cwr.timeout = sfp->timeout;
+	cwr.cmnd = cmnd;
+	srp = sg_common_write(sfp, &cwr);
+	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
 
-static int
-sg_allow_access(struct file *filp, u8 *cmd)
+static inline int
+sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
-	struct sg_fd *sfp = filp->private_data;
-
-	if (sfp->parentdp->device->type == TYPE_SCANNER)
-		return 0;
+	if (!list_empty(&sfp->rq_list))
+		return -EBUSY;  /* already active requests on fd */
+	if (len > sfp->rsv_srp->sgat_h.buflen)
+		return -ENOMEM; /* MMAP_IO size must fit in reserve */
+	if (rq_flags & SG_FLAG_DIRECT_IO)
+		return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
+	return 0;
+}
 
-	return blk_verify_command(cmd, filp->f_mode);
+static int
+sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
+	      int len, u8 *cdbp)
+{
+	if (!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE)
+		return -EMSGSIZE;
+	if (copy_from_user(cdbp, u_cdbp, len))
+		return -EFAULT;
+	if (O_RDWR != (filp->f_flags & O_ACCMODE)) {	/* read-only */
+		switch (sfp->parentdp->device->type) {
+		case TYPE_DISK:
+		case TYPE_RBC:
+		case TYPE_ZBC:
+			return blk_verify_command(cdbp, filp->f_mode);
+		default:	/* SSC, SES, etc cbd_s may differ from SBC */
+			break;
+		}
+	}
+	return 0;
 }
 
-static ssize_t
-sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
-		 size_t count, int blocking, int read_only, int sg_io_owned,
-		 struct sg_request **o_srp)
+static int
+sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
+	  bool sync, struct sg_request **o_srp)
 {
-	int k;
+	int res, timeout;
+	unsigned long ul_timeout;
 	struct sg_request *srp;
-	sg_io_hdr_t *hp;
+	struct sg_comm_wr_t cwr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
-	int timeout;
-	unsigned long ul_timeout;
-
-	if (count < SZ_SG_IO_HDR)
-		return -EINVAL;
-	if (!access_ok(buf, count))
-		return -EFAULT; /* protects following copy_from_user()s + get_user()s */
 
-	sfp->cmd_q = 1;	/* when sg_io_hdr seen, set command queuing on */
-	if (!(srp = sg_add_request(sfp))) {
-		SG_LOG(1, sfp->parentdp, "%s: queue full\n", __func__);
-		return -EDOM;
-	}
-	srp->sg_io_owned = sg_io_owned;
-	hp = &srp->header;
-	if (__copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
-		sg_remove_request(sfp, srp);
-		return -EFAULT;
-	}
-	if (hp->interface_id != 'S') {
-		sg_remove_request(sfp, srp);
-		return -ENOSYS;
-	}
+	/* now doing v3 blocking (sync) or non-blocking submission */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
-		if (hp->dxfer_len > sfp->reserve.bufflen) {
-			sg_remove_request(sfp, srp);
-			return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
-		}
-		if (hp->flags & SG_FLAG_DIRECT_IO) {
-			sg_remove_request(sfp, srp);
-			return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
-		}
-		if (sfp->res_in_use) {
-			sg_remove_request(sfp, srp);
-			return -EBUSY;	/* reserve buffer already being used */
-		}
-	}
-	ul_timeout = msecs_to_jiffies(srp->header.timeout);
-	timeout = (ul_timeout < INT_MAX) ? ul_timeout : INT_MAX;
-	if ((!hp->cmdp) || (hp->cmd_len < 6) || (hp->cmd_len > sizeof (cmnd))) {
-		sg_remove_request(sfp, srp);
-		return -EMSGSIZE;
-	}
-	if (!access_ok(hp->cmdp, hp->cmd_len)) {
-		sg_remove_request(sfp, srp);
-		return -EFAULT;	/* protects following copy_from_user()s + get_user()s */
-	}
-	if (__copy_from_user(cmnd, hp->cmdp, hp->cmd_len)) {
-		sg_remove_request(sfp, srp);
-		return -EFAULT;
-	}
-	if (read_only && sg_allow_access(file, cmnd)) {
-		sg_remove_request(sfp, srp);
-		return -EPERM;
+		res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
+		if (res)
+			return res;
 	}
-	k = sg_common_write(sfp, srp, cmnd, timeout, blocking);
-	if (k < 0)
-		return k;
+	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
+	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	ul_timeout = msecs_to_jiffies(hp->timeout);
+	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	res = sg_fetch_cmnd(filp, sfp, hp->cmdp, hp->cmd_len, cmnd);
+	if (res)
+		return res;
+	cwr.frq_bm[0] = 0;
+	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	cwr.h3p = hp;
+	cwr.timeout = timeout;
+	cwr.cmnd = cmnd;
+	srp = sg_common_write(sfp, &cwr);
+	if (IS_ERR(srp))
+		return PTR_ERR(srp);
 	if (o_srp)
 		*o_srp = srp;
-	return count;
+	return 0;
 }
 
-static int
-sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
-		u8 *cmnd, int timeout, int blocking)
+/*
+ * All writes and submits converge on this function to launch the SCSI
+ * command/request (via blk_execute_rq_nowait). Returns a pointer to a
+ * sg_request object holding the request just issued or a negated errno
+ * value twisted by ERR_PTR.
+ */
+static struct sg_request *
+sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
-	int k, at_head;
+	bool at_head;
+	int res = 0;
+	int dxfr_len, dir, cmd_len;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u32 rq_flags;
 	struct sg_device *sdp = sfp->parentdp;
-	sg_io_hdr_t *hp = &srp->header;
-
-	srp->data.cmd_opcode = cmnd[0];	/* hold opcode of command */
-	hp->status = 0;
-	hp->masked_status = 0;
-	hp->msg_status = 0;
-	hp->info = 0;
-	hp->host_status = 0;
-	hp->driver_status = 0;
-	hp->resid = 0;
-	SG_LOG(4, sfp->parentdp, "%s:  opcode=0x%02x, cmd_sz=%d\n", __func__,
-	       (int)cmnd[0], hp->cmd_len);
-
-	if (hp->dxfer_len >= SZ_256M)
-		return -EINVAL;
-
-	k = sg_start_req(srp, cmnd);
-	if (k) {
-		SG_LOG(1, sfp->parentdp, "%s: start_req err=%d\n", __func__,
-		       k);
-		sg_finish_rem_req(srp);
-		sg_remove_request(sfp, srp);
-		return k;	/* probably out of space --> ENOMEM */
+	struct sg_request *srp;
+	struct sg_io_hdr *hi_p;
+
+	hi_p = cwrp->h3p;
+	dir = hi_p->dxfer_direction;
+	dxfr_len = hi_p->dxfer_len;
+	rq_flags = hi_p->flags;
+	pack_id = hi_p->pack_id;
+	if (dxfr_len >= SZ_256M)
+		return ERR_PTR(-EINVAL);
+
+	srp = sg_add_request(sfp, dxfr_len, cwrp);
+	if (IS_ERR(srp))
+		return srp;
+	srp->rq_flags = rq_flags;
+	srp->pack_id = pack_id;
+
+	cmd_len = hi_p->cmd_len;
+	memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
+	SG_LOG(4, sdp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
+	       (int)cwrp->cmnd[0], cmd_len, pack_id);
+
+	res = sg_start_req(srp, cwrp->cmnd, cmd_len, dir);
+	if (res < 0)		/* probably out of space --> -ENOMEM */
+		goto err_out;
+	if (unlikely(atomic_read(&sdp->detaching))) {
+		res = -ENODEV;
+		goto err_out;
 	}
-	if (atomic_read(&sdp->detaching)) {
-		if (srp->bio) {
-			scsi_req_free_cmd(scsi_req(srp->rq));
-			blk_put_request(srp->rq);
-			srp->rq = NULL;
-		}
-
-		sg_finish_rem_req(srp);
-		sg_remove_request(sfp, srp);
-		return -ENODEV;
+	if (unlikely(test_bit(SG_FRQ_BLK_PUT_REQ, srp->frq_bm) || !srp->rq)) {
+		res = -EIDRM;	/* this failure unexpected but observed */
+		goto err_out;
 	}
-
-	hp->duration = jiffies_to_msecs(jiffies);
-	if (hp->interface_id != '\0' &&	/* v3 (or later) interface */
-	    (SG_FLAG_Q_AT_TAIL & hp->flags))
-		at_head = 0;
-	else
-		at_head = 1;
-
-	srp->rq->timeout = timeout;
+	srp->rq->timeout = cwrp->timeout;
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
+	srp->start_ns = ktime_get_boot_ns();
+	srp->duration = 0;
+
+	if (srp->s_hdr3.interface_id == '\0')
+		at_head = true; /* backward compatibility: v1+v2 interfaces */
+	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+	/* cmd flags can override sfd setting */
+		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
+	else            /* this sfd is defaulting to head */
+		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, at_head, sg_rq_end_io);
-	return 0;
+	return srp;
+err_out:
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return ERR_PTR(res);
 }
 
+static inline int
+sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
+	      enum sg_rq_state new_st)
+{
+	enum sg_rq_state act_old_st = (enum sg_rq_state)
+				atomic_cmpxchg(&srp->rq_st, old_st, new_st);
+
+	if (act_old_st == old_st)
+		return 0;       /* implies new_st --> srp->rq_st */
+	else if (IS_ENABLED(CONFIG_SCSI_LOGGING))
+		sg_rep_rq_state_fail(srp->parentfp->parentdp, old_st, new_st,
+				     act_old_st);
+	return -EPROTOTYPE;
+}
 
 /*
  * read(2) related functions follow. They are shown after write(2) related
@@ -681,238 +817,309 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
  * half of the ioctl(SG_IO) share code with read(2).
  */
 
-static struct sg_request *
-sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
+/*
+ * This function is called by wait_event_interruptible in sg_read() and
+ * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
+ * returns true (or an event like a signal (e.g. control-C) occurs).
+ */
+static inline bool
+sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
 {
-	struct sg_request *resp;
-	unsigned long iflags;
+	struct sg_request *srp;
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(resp, &sfp->rq_list, entry) {
-		/* look for requests that are ready + not SG_IO owned */
-		if ((resp->done == 1) && (!resp->sg_io_owned) &&
-		    ((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
-			resp->done = 2;	/* guard against other readers */
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-			return resp;
-		}
+	if (unlikely(atomic_read(&sfp->parentdp->detaching))) {
+		*srpp = NULL;
+		return true;
 	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
+	srp = sg_find_srp_by_id(sfp, pack_id);
+	*srpp = srp;
+	return !!srp;
 }
 
-static ssize_t
-sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
-	    struct sg_request *srp)
+/*
+ * Returns number of bytes copied to user space provided sense buffer or
+ * negated errno value.
+ */
+static int
+sg_copy_sense(struct sg_request *srp)
 {
-	sg_io_hdr_t *hp = &srp->header;
-	int err = 0, err2;
-	int len;
-
-	if (count < SZ_SG_IO_HDR) {
-		err = -EINVAL;
-		goto err_out;
-	}
-	hp->sb_len_wr = 0;
-	if ((hp->mx_sb_len > 0) && hp->sbp) {
-		if ((CHECK_CONDITION & hp->masked_status) ||
-		    (DRIVER_SENSE & hp->driver_status)) {
-			int sb_len = SCSI_SENSE_BUFFERSIZE;
-
-			sb_len = (hp->mx_sb_len > sb_len) ? sb_len :
-							    hp->mx_sb_len;
+	int sb_len_wr = 0;
+	int scsi_stat;
+
+	/* If need be, copy the sense buffer to the user space */
+	scsi_stat = srp->rq_result & 0xff;
+	if ((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
+	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
+		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
+		int mx_sb_len;
+		void __user *up;
+
+		up = (void __user *)srp->s_hdr3.sbp;
+		mx_sb_len = srp->s_hdr3.mx_sb_len;
+		if (up && mx_sb_len > 0 && srp->sense_bp) {
+			sb_len = min_t(int, sb_len, mx_sb_len);
 			/* Additional sense length field */
-			len = 8 + (int) srp->sense_b[7];
-			len = (len > sb_len) ? sb_len : len;
-			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
-				err = -EFAULT;
-				goto err_out;
-			}
-			hp->sb_len_wr = len;
+			sb_len_wr = 8 + (int)srp->sense_bp[7];
+			sb_len_wr = min_t(int, sb_len, sb_len_wr);
+			if (copy_to_user(up, srp->sense_bp, sb_len_wr))
+				sb_len_wr = -EFAULT;
 		}
+		kfree(srp->sense_bp);
+		srp->sense_bp = NULL;
 	}
-	if (hp->masked_status || hp->host_status || hp->driver_status)
-		hp->info |= SG_INFO_CHECK;
-	if (copy_to_user(buf, hp, SZ_SG_IO_HDR)) {
-		err = -EFAULT;
-		goto err_out;
-	}
-err_out:
-	err2 = sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	return err ? : err2 ? : count;
+	return sb_len_wr;
 }
 
-static int
-srp_done(struct sg_fd *sfp, struct sg_request *srp)
+#if IS_ENABLED(CONFIG_SCSI_LOGGING)
+static void
+sg_rep_rq_state_fail(struct sg_device *sdp, enum sg_rq_state exp_old_st,
+		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
 {
-	unsigned long flags;
-	int ret;
+	const char *eors = "expected old rq_st: ";
+	const char *aors = "actual old rq_st: ";
 
-	read_lock_irqsave(&sfp->rq_list_lock, flags);
-	ret = srp->done;
-	read_unlock_irqrestore(&sfp->rq_list_lock, flags);
-	return ret;
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+		SG_LOG(1, sdp, "%s: %s%s, %s%s, wanted rq_st: %s\n", __func__,
+		       eors, sg_rq_st_str(exp_old_st, false),
+		       aors, sg_rq_st_str(act_old_st, false),
+		       sg_rq_st_str(want_st, false));
+	else
+		pr_info("sg: %s: %s%d, %s%d, wanted rq_st: %d\n", __func__,
+			eors, (int)exp_old_st, aors, (int)act_old_st,
+			(int)want_st);
 }
-
-static ssize_t
-sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
+#else
+static void
+sg_rep_rq_state_fail(struct sg_device *sdp, enum sg_rq_state exp_old_st,
+		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
 {
-	struct sg_device *sdp;
-	struct sg_fd *sfp;
-	struct sg_request *srp;
-	int req_pack_id = -1;
-	sg_io_hdr_t *hp;
-	struct sg_header *old_hdr = NULL;
-	int retval = 0;
-
-	/*
-	 * This could cause a response to be stranded. Close the associated
-	 * file descriptor to free up any resources being held.
-	 */
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
+}
+#endif
 
-	sfp = filp->private_data;
-	sdp = sfp->parentdp;
-	SG_LOG(3, sdp, "%s: read() count=%d\n", __func__, (int)count);
-	if (!sdp)
-		return -ENXIO;
+static int
+sg_rec_v3_state(struct sg_fd *sfp, struct sg_request *srp)
+{
+	int err = 0;
+	int sb_len_wr;
+	u32 rq_res = srp->rq_result;
+
+	sb_len_wr = sg_copy_sense(srp);
+	if (sb_len_wr < 0)
+		return sb_len_wr;
+	if (rq_res & SG_ML_RESULT_MSK)
+		srp->rq_info |= SG_INFO_CHECK;
+	if (unlikely(atomic_read(&sfp->parentdp->detaching)))
+		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
+	return err;
+}
 
-	if (!access_ok(buf, count))
-		return -EFAULT;
-	if (sfp->force_packid && (count >= SZ_SG_HEADER)) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (!old_hdr)
-			return -ENOMEM;
-		if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
-			retval = -EFAULT;
-			goto free_old_hdr;
-		}
-		if (old_hdr->reply_len < 0) {
-			if (count >= SZ_SG_IO_HDR) {
-				sg_io_hdr_t *new_hdr;
-
-				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
-				if (!new_hdr) {
-					retval = -ENOMEM;
-					goto free_old_hdr;
-				}
-				retval = __copy_from_user
-				    (new_hdr, buf, SZ_SG_IO_HDR);
-				req_pack_id = new_hdr->pack_id;
-				kfree(new_hdr);
-				if (retval) {
-					retval = -EFAULT;
-					goto free_old_hdr;
-				}
-			}
-		} else
-			req_pack_id = old_hdr->pack_id;
-	}
-	srp = sg_get_rq_mark(sfp, req_pack_id);
-	if (!srp) {		/* now wait on packet to arrive */
-		if (atomic_read(&sdp->detaching)) {
-			retval = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (filp->f_flags & O_NONBLOCK) {
-			retval = -EAGAIN;
-			goto free_old_hdr;
-		}
-		retval = wait_event_interruptible(sfp->read_wait,
-			(atomic_read(&sdp->detaching) ||
-			(srp = sg_get_rq_mark(sfp, req_pack_id))));
-		if (atomic_read(&sdp->detaching)) {
-			retval = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (retval) {
-			/* -ERESTARTSYS as signal hit process */
-			goto free_old_hdr;
-		}
-	}
-	if (srp->header.interface_id != '\0') {
-		retval = sg_new_read(sfp, buf, count, srp);
-		goto free_old_hdr;
-	}
+/*
+ * Completes a v3 request/command. Called from sg_read {v2 or v3},
+ * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
+ * completing a v3 request/command.
+ */
+static int
+sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
+{
+	int err, err2;
+	int rq_result = srp->rq_result;
+	struct sg_io_hdr hdr3;
+	struct sg_io_hdr *hp = &hdr3;
+
+	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
+	err = sg_rec_v3_state(sfp, srp);
+	memset(hp, 0, sizeof(*hp));
+	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
+	hp->sb_len_wr = srp->sense_len;
+	hp->info = srp->rq_info;
+	hp->resid = srp->in_resid;
+	hp->duration = srp->duration;
+	hp->status = rq_result & 0xff;
+	hp->masked_status = status_byte(rq_result);
+	hp->msg_status = msg_byte(rq_result);
+	hp->host_status = host_byte(rq_result);
+	hp->driver_status = driver_byte(rq_result);
+	/* copy_*_user() [without leading underscores] checks access */
+	if (copy_to_user(p, hp, SZ_SG_IO_HDR))
+		err = err ? err : -EFAULT;
+	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
+	if (err2)
+		err = err ? err : err2;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err ? err : 0;
+}
 
-	hp = &srp->header;
-	if (old_hdr == NULL) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (!old_hdr) {
-			retval = -ENOMEM;
-			goto free_old_hdr;
+static int
+sg_rd_v1v2(void __user *buf, int count, struct sg_fd *sfp,
+	   struct sg_request *srp)
+{
+	int res = 0;
+	u32 rq_result = srp->rq_result;
+	struct sg_header *h2p;
+	struct sg_slice_hdr3 *sh3p;
+	struct sg_header a_v2hdr;
+
+	h2p = &a_v2hdr;
+	memset(h2p, 0, SZ_SG_HEADER);
+	sh3p = &srp->s_hdr3;
+	h2p->reply_len = (int)sh3p->timeout;
+	h2p->pack_len = h2p->reply_len; /* old, strange behaviour */
+	h2p->pack_id = sh3p->pack_id;
+	h2p->twelve_byte = (srp->cmd_opcode >= 0xc0 && sh3p->cmd_len == 12);
+	h2p->target_status = status_byte(rq_result);
+	h2p->host_status = host_byte(rq_result);
+	h2p->driver_status = driver_byte(rq_result);
+	if ((CHECK_CONDITION & status_byte(rq_result)) ||
+	    (DRIVER_SENSE & driver_byte(rq_result))) {
+		if (srp->sense_bp) {
+			memcpy(h2p->sense_buffer, srp->sense_bp,
+			       sizeof(h2p->sense_buffer));
+			kfree(srp->sense_bp);
+			srp->sense_bp = NULL;
 		}
 	}
-	memset(old_hdr, 0, SZ_SG_HEADER);
-	old_hdr->reply_len = (int) hp->timeout;
-	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
-	old_hdr->pack_id = hp->pack_id;
-	old_hdr->twelve_byte =
-	    ((srp->data.cmd_opcode >= 0xc0) && (hp->cmd_len == 12)) ? 1 : 0;
-	old_hdr->target_status = hp->masked_status;
-	old_hdr->host_status = hp->host_status;
-	old_hdr->driver_status = hp->driver_status;
-	if ((hp->masked_status & CHECK_CONDITION) ||
-	    (hp->driver_status & DRIVER_SENSE))
-		memcpy(old_hdr->sense_buffer, srp->sense_b,
-		       sizeof(old_hdr->sense_buffer));
-	switch (hp->host_status) {
+	switch (host_byte(rq_result)) {
 	/*
-	 * This setup of 'result' is for backward compatibility and is best
-	 * ignored by the user who should use target, host + driver status
+	 * This foolowing setting of 'result' is for backward compatibility
+	 * and is best ignored by the user who should use target, host and
+	 * driver status.
 	 */
 	case DID_OK:
 	case DID_PASSTHROUGH:
 	case DID_SOFT_ERROR:
-		old_hdr->result = 0;
+		h2p->result = 0;
 		break;
 	case DID_NO_CONNECT:
 	case DID_BUS_BUSY:
 	case DID_TIME_OUT:
-		old_hdr->result = EBUSY;
+		h2p->result = EBUSY;
 		break;
 	case DID_BAD_TARGET:
 	case DID_ABORT:
 	case DID_PARITY:
 	case DID_RESET:
 	case DID_BAD_INTR:
-		old_hdr->result = EIO;
+		h2p->result = EIO;
 		break;
 	case DID_ERROR:
-		old_hdr->result = (srp->sense_b[0] == 0 &&
-				  hp->masked_status == GOOD) ? 0 : EIO;
+		h2p->result = (srp->sense_bp &&
+				status_byte(rq_result) == GOOD) ? 0 : EIO;
 		break;
 	default:
-		old_hdr->result = EIO;
+		h2p->result = EIO;
 		break;
 	}
 
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
-		if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
-			retval = -EFAULT;
-			goto free_old_hdr;
-		}
+		if (copy_to_user(buf, h2p, SZ_SG_HEADER))
+			return -EFAULT;
 		buf += SZ_SG_HEADER;
-		if (count > old_hdr->reply_len)
-			count = old_hdr->reply_len;
+		if (count > h2p->reply_len)
+			count = h2p->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
-				retval = -EFAULT;
-				goto free_old_hdr;
-			}
+			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER))
+				return -EFAULT;
 		}
 	} else
-		count = (old_hdr->result == 0) ? 0 : -EIO;
-	sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	retval = count;
-free_old_hdr:
-	kfree(old_hdr);
-	return retval;
+		res = (h2p->result == 0) ? 0 : -EIO;
+	atomic_set(&srp->rq_st, SG_RS_DONE_RD);
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return res;
+}
+
+/*
+ * This is the read(2) system call entry point (see sg_fops) for this driver.
+ * Accepts v1, v2 or v3 type headers (not v4). Returns count or negated
+ * errno; if count is 0 then v3: returns -EINVAL; v1+v2: 0 when no other
+ * error detected or -EIO.
+ */
+static ssize_t
+sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
+{
+	bool could_be_v3;
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int want_id = SG_PACK_ID_WILDCARD;
+	int hlen, ret;
+	struct sg_device *sdp = NULL;
+	struct sg_fd *sfp;
+	struct sg_request *srp = NULL;
+	struct sg_header *h2p = NULL;
+	struct sg_io_hdr a_sg_io_hdr;
+
+	/*
+	 * This could cause a response to be stranded. Close the associated
+	 * file descriptor to free up any resources being held.
+	 */
+	ret = sg_check_file_access(filp, __func__);
+	if (ret)
+		return ret;
+
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	SG_LOG(3, sdp, "%s: read() count=%d\n", __func__, (int)count);
+	ret = sg_allow_if_err_recovery(sdp, non_block);
+	if (ret)
+		return ret;
+	if (unlikely(atomic_read(&sdp->detaching)))
+		return -ENODEV;
+
+	could_be_v3 = (count >= SZ_SG_IO_HDR);
+	hlen = could_be_v3 ? SZ_SG_IO_HDR : SZ_SG_HEADER;
+	h2p = (struct sg_header *)&a_sg_io_hdr;
+
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm) && (int)count >= hlen) {
+		/*
+		 * Even though this is a user space read() system call, this
+		 * code is cheating to fetch the pack_id.
+		 * Only need first three 32 bit ints to determine interface.
+		 */
+		if (copy_from_user(h2p, p, 3 * sizeof(int)))
+			return -EFAULT;
+		if (h2p->reply_len < 0 && could_be_v3) {
+			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
+
+			if (v3_hdr->interface_id == 'S') {/* v3, new sanity */
+				struct sg_io_hdr __user *h3_up;
+
+				h3_up = (struct sg_io_hdr __user *)p;
+				ret = get_user(want_id, &h3_up->pack_id);
+				if (ret)
+					return ret;
+			} else {
+				return -EPERM;
+			}
+		} else { /* for v1+v2 interfaces, this is the 3rd integer */
+			want_id = h2p->pack_id;
+		}
+	}
+	srp = sg_find_srp_by_id(sfp, want_id);
+	if (!srp) {     /* nothing available so wait on packet to arrive or */
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
+			return -EAGAIN;
+		ret = wait_event_interruptible(sfp->read_wait,
+					       sg_get_ready_srp(sfp, &srp,
+								want_id));
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (ret)	/* -ERESTARTSYS as signal hit process */
+			return ret;
+		/* otherwise srp should be valid */
+	}
+	if (srp->s_hdr3.interface_id == '\0') {
+		ret = sg_rd_v1v2(p, (int)count, sfp, srp);
+	} else {
+		if ((int)count < SZ_SG_IO_HDR)
+			return -EINVAL;
+		ret = sg_v3_receive(sfp, srp, p);
+	}
+	if (ret < 0)
+		SG_LOG(1, sdp, "%s: negated errno: %d\n", __func__, ret);
+	return ret < 0 ? ret : (int)count;
 }
 
 static int
@@ -921,289 +1128,573 @@ max_sectors_bytes(struct request_queue *q)
 	unsigned int max_sectors = queue_max_sectors(q);
 
 	max_sectors = min_t(unsigned int, max_sectors, INT_MAX >> 9);
-
 	return max_sectors << 9;
 }
 
+/*
+ * Calculates sg_device::max_sgat_elems and sg_device::max_sgat_sz. It uses
+ * the device's request queue. If q not available sets max_sgat_elems to 1
+ * and max_sgat_sz to PAGE_SIZE. If potential max_sgat_sz is greater than
+ * 2^30 scales down the implied max_segment_size so the product of the
+ * max_segment_size and max_sgat_elems is less than or equal to 2^30 .
+ */
 static void
-sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
+sg_calc_sgat_param(struct sg_device *sdp)
 {
-	struct sg_request *srp;
-	int val;
-	unsigned int ms;
+	int sz;
+	u64 m;
+	struct scsi_device *sdev = sdp->device;
+	struct request_queue *q = sdev ? sdev->request_queue : NULL;
+
+	if (!q) {
+		sdp->max_sgat_elems = 1;
+		sdp->max_sgat_sz = PAGE_SIZE;
+		return;
+	}
+	sdp->max_sgat_elems = queue_max_segments(q);
+	m = (u64)queue_max_segment_size(q) * queue_max_segments(q);
+	if (m < PAGE_SIZE) {
+		sdp->max_sgat_elems = 1;
+		sdp->max_sgat_sz = PAGE_SIZE;
+		return;
+	}
+	sz = (int)min_t(u64, m, 1 << 30);
+	if (sz == (1 << 30))	/* round down so: sz = elems * elem_sz */
+		sz = ((1 << 30) / sdp->max_sgat_elems) * sdp->max_sgat_elems;
+	sdp->max_sgat_sz = sz;
+}
 
-	val = 0;
-	list_for_each_entry(srp, &sfp->rq_list, entry) {
-		if (val >= SG_MAX_QUEUE)
+static u32
+sg_calc_rq_dur(const struct sg_request *srp)
+{
+	ktime_t ts0 = srp->start_ns;
+	ktime_t now_ts;
+	s64 diff;
+
+	if (ts0 == 0)
+		return 0;
+	if (unlikely(ts0 == U64_MAX))	/* _prior_ to issuing req */
+		return 999999999;	/* eye catching */
+	now_ts = ktime_get_boot_ns();
+	if (unlikely(ts0 > now_ts))
+		return 999999998;
+	/* unlikely req duration will exceed 2**32 milliseconds */
+	diff = ktime_ms_delta(now_ts, ts0);
+	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
+}
+
+/* Return of U32_MAX means srp is inactive or in slave waiting state */
+static u32
+sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
+	   bool *is_durp)
+{
+	bool is_dur = false;
+	u32 res = U32_MAX;
+
+	switch (sr_stp ? *sr_stp : atomic_read(&srp->rq_st)) {
+	case SG_RS_INFLIGHT:
+	case SG_RS_BUSY:
+		res = sg_calc_rq_dur(srp);
+		break;
+	case SG_RS_AWAIT_RD:
+	case SG_RS_DONE_RD:
+		res = srp->duration;
+		is_dur = true;	/* completion has occurred, timing finished */
+		break;
+	case SG_RS_INACTIVE:
+	default:
+		break;
+	}
+	if (is_durp)
+		*is_durp = is_dur;
+	return res;
+}
+
+static void
+sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
+			struct sg_req_info *rip)
+		__must_hold(&sfp->rq_list_lock)
+{
+	spin_lock(&srp->req_lck);
+	rip->duration = sg_get_dur(srp, NULL, NULL);
+	if (rip->duration == U32_MAX)
+		rip->duration = 0;
+	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
+	rip->pack_id = srp->pack_id;
+	rip->usr_ptr = srp->s_hdr3.usr_ptr;
+	spin_unlock(&srp->req_lck);
+}
+
+static inline bool
+sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
+{
+	return atomic_read(&srp->rq_st) != SG_RS_INFLIGHT ||
+	       unlikely(atomic_read(&sdp->detaching));
+}
+
+/*
+ * This is a blocking wait for a specific srp. When h4p is non-NULL, it is
+ * the blocking multiple request case
+ */
+static int
+sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
+		  struct sg_request *srp)
+{
+	int res;
+	enum sg_rq_state sr_st;
+	struct sg_device *sdp = sfp->parentdp;
+
+	SG_LOG(3, sdp, "%s: about to wait_event...()\n", __func__);
+	/* usually will be woken up by sg_rq_end_io() callback */
+	res = wait_event_interruptible(sfp->read_wait,
+				       sg_rq_landed(sdp, srp));
+	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+		/* orphans harvested when sfp->keep_orphan is false */
+		atomic_set(&srp->rq_st, SG_RS_INFLIGHT);
+		SG_LOG(1, sdp, "%s:  wait_event_interruptible gave %d\n",
+		       __func__, res);
+		return res;
+	}
+	if (unlikely(atomic_read(&sdp->detaching))) {
+		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		return -ENODEV;
+	}
+	sr_st = atomic_read(&srp->rq_st);
+	if (unlikely(sr_st != SG_RS_AWAIT_RD))
+		return -EPROTO;         /* Logic error */
+	res = sg_rstate_chg(srp, sr_st, SG_RS_BUSY);
+	if (unlikely(res))
+		return res;
+	res = sg_v3_receive(sfp, srp, p);
+	return (res < 0) ? res : 0;
+}
+
+/*
+ * Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface.
+ * Returns 0 on success else a negated errno.
+ */
+static int
+sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
+	     void __user *p)
+{
+	int res;
+	struct sg_request *srp = NULL;
+	u8 hu8arr[SZ_SG_IO_HDR];
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
+
+	SG_LOG(3, sdp, "%s:  SG_IO%s\n", __func__,
+	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
+	res = sg_allow_if_err_recovery(sdp, false);
+	if (res)
+		return res;
+	if (atomic_read(&sdp->detaching))
+		return -ENODEV;
+	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
+		return -EFAULT;
+	if (h3p->interface_id == 'S')
+		res = sg_submit(filp, sfp, h3p, true, &srp);
+	else
+		return -EPERM;
+	if (unlikely(res < 0))
+		return res;
+	if (!srp)	/* mrq case: already processed all responses */
+		return res;
+	res = sg_wait_event_srp(filp, sfp, p, srp);
+	if (res) {
+		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s\n", __func__,
+		       "unexpected srp", srp,
+		       sg_rq_st_str(atomic_read(&srp->rq_st), false));
+	}
+	return res;
+}
+
+/*
+ * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
+ * <= max_segment_size. Exit if that is the same as old size; otherwise
+ * create a new candidate request of the new size. Then decide whether to
+ * re-use an existing free list request (least buflen >= required size) or
+ * use the new candidate. If new one used, leave old one but it is no longer
+ * the reserved request. Returns 0 on success, else a negated errno value.
+ */
+static int
+sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
+	__must_hold(&sfp->f_mutex)
+{
+	bool use_new_srp = false;
+	int res = 0;
+	int new_sz, blen;
+	unsigned long iflags;
+	struct sg_request *o_srp;       /* prior reserve sg_request */
+	struct sg_request *n_srp;       /* new sg_request, may be used */
+	struct sg_request *t_srp;       /* other fl entries */
+	struct sg_device *sdp = sfp->parentdp;
+
+	rcu_read_lock();
+	o_srp = sfp->rsv_srp;
+	if (!o_srp) {
+		rcu_read_unlock();
+		return -EPROTO;
+	}
+	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
+	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
+	blen = o_srp->sgat_h.buflen;
+	rcu_read_unlock();
+	SG_LOG(3, sdp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
+	       __func__, blen, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
+	if (blen == new_sz)
+		return 0;
+	n_srp = sg_mk_srp_sgat(sfp, true /* can take time */, new_sz);
+	if (IS_ERR(n_srp))
+		return PTR_ERR(n_srp);
+	/* new sg_request object, sized correctly is now available */
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	o_srp = sfp->rsv_srp;
+	if (!o_srp) {
+		res = -EPROTO;
+		goto wr_unlock;
+	}
+	if (SG_RS_ACTIVE(o_srp) ||
+	    test_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm)) {
+		res = -EBUSY;
+		goto wr_unlock;
+	}
+	use_new_srp = true;
+	list_for_each_entry(t_srp, &sfp->rq_fl, fl_entry) {
+		if (t_srp != o_srp && new_sz <= t_srp->sgat_h.buflen &&
+		    !SG_RS_ACTIVE(t_srp)) {
+			/* good candidate on free list, use */
+			use_new_srp = false;
+			sfp->rsv_srp = t_srp;
 			break;
-		rinfo[val].req_state = srp->done + 1;
-		rinfo[val].problem =
-			srp->header.masked_status &
-			srp->header.host_status &
-			srp->header.driver_status;
-		if (srp->done)
-			rinfo[val].duration =
-				srp->header.duration;
-		else {
-			ms = jiffies_to_msecs(jiffies);
-			rinfo[val].duration =
-				(ms > srp->header.duration) ?
-				(ms - srp->header.duration) : 0;
 		}
-		rinfo[val].orphan = srp->orphan;
-		rinfo[val].sg_io_owned = srp->sg_io_owned;
-		rinfo[val].pack_id = srp->header.pack_id;
-		rinfo[val].usr_ptr = srp->header.usr_ptr;
-		val++;
 	}
+	if (use_new_srp) {
+		sfp->rsv_srp = n_srp;
+		/* add to front of free list */
+		list_add_rcu(&n_srp->fl_entry, &sfp->rq_fl);
+		SG_LOG(6, sdp, "%s: new rsv srp=0x%p ++\n", __func__, n_srp);
+	}
+wr_unlock:
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	if (!use_new_srp) {
+		sg_remove_sgat(n_srp);
+		kfree(n_srp);   /* no-one else has seen n_srp, so safe */
+	}
+	return res;
+}
+
+/*
+ * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First
+ * fetch from the active list then, if there is still room, from the free
+ * list. Some of the trailing elements may be empty which is indicated by all
+ * fields being zero. Any requests beyond SG_MAX_QUEUE are ignored.
+ */
+static int
+sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
+{
+	int k, len, result, val, n;
+	struct sg_request *srp;
+	struct sg_req_info *rinfop;
+	struct sg_req_info *rip;
+
+	SG_LOG(3, sfp->parentdp, "%s:    SG_GET_REQUEST_TABLE\n", __func__);
+	k = SG_MAX_QUEUE;
+	len = SZ_SG_REQ_INFO * k;
+	rinfop = kcalloc(k, SZ_SG_REQ_INFO, GFP_KERNEL);
+	if (!rinfop)
+		return -ENOMEM;
+	val = 0;
+	rcu_read_lock();
+	list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry)
+		++val;
+
+	if (val > 0) {
+		int max_num = min_t(int, k, val);
+
+		n = 0;
+		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+			if (n >= max_num)
+				break;
+			rip = &rinfop[n];
+			sg_fill_request_element(sfp, srp, rip);
+			n++;
+		}
+		list_for_each_entry_rcu(srp, &sfp->rq_fl, fl_entry) {
+			if (n >= max_num)
+				break;
+			rip = &rinfop[n];
+			sg_fill_request_element(sfp, srp, rip);
+			n++;
+		}
+	}
+	rcu_read_unlock();
+	result = copy_to_user(p, rinfop, len);
+	result = result ? -EFAULT : 0;
+	kfree(rinfop);
+	return result;
+}
+
+static int
+sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_device *sdp, void __user *p)
+{
+	struct sg_scsi_id __user *sg_idp = p;
+
+	SG_LOG(3, sdp, "%s:    SG_GET_SCSI_ID\n", __func__);
+	if (!access_ok(p, sizeof(struct sg_scsi_id)))
+		return -EFAULT;
+
+	if (unlikely(atomic_read(&sdp->detaching)))
+		return -ENODEV;
+	__put_user((int)sdev->host->host_no,
+		   &sg_idp->host_no);
+	__put_user((int)sdev->channel, &sg_idp->channel);
+	__put_user((int)sdev->id, &sg_idp->scsi_id);
+	__put_user((int)sdev->lun, &sg_idp->lun);
+	__put_user((int)sdev->type, &sg_idp->scsi_type);
+	__put_user((short)sdev->host->cmd_per_lun,
+		   &sg_idp->h_cmd_per_lun);
+	__put_user((short)sdev->queue_depth,
+		   &sg_idp->d_queue_depth);
+	__put_user(0, &sg_idp->unused[0]);
+	__put_user(0, &sg_idp->unused[1]);
+	return 0;
 }
 
 static long
 sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
-	void __user *p = (void __user *)arg;
+	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
+	bool check_detach = false;
+	int val;
+	int result = 0;
+	void __user *p = uptr64(arg);
 	int __user *ip = p;
-	int result, val, read_only;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
-	unsigned long iflags;
+	struct scsi_device *sdev;
+	__maybe_unused const char *pmlp = ", pass to mid-level";
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	if (!sdp)
-		return -ENXIO;
 	SG_LOG(6, sdp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
 	       !!(filp->f_flags & O_NONBLOCK));
-	read_only = (O_RDWR != (filp->f_flags & O_ACCMODE));
+	if (!sdp)
+		return -ENXIO;
+	if (unlikely(atomic_read(&sdp->detaching)))
+		return -ENODEV;
+	sdev = sdp->device;
 
 	switch (cmd_in) {
 	case SG_IO:
-		if (atomic_read(&sdp->detaching))
-			return -ENODEV;
-		if (!scsi_block_when_processing_errors(sdp->device))
-			return -ENXIO;
-		if (!access_ok(p, SZ_SG_IO_HDR))
-			return -EFAULT;
-		result = sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
-				 1, read_only, 1, &srp);
-		if (result < 0)
-			return result;
-		result = wait_event_interruptible(sfp->read_wait,
-			(srp_done(sfp, srp) || atomic_read(&sdp->detaching)));
-		if (atomic_read(&sdp->detaching))
-			return -ENODEV;
-		write_lock_irq(&sfp->rq_list_lock);
-		if (srp->done) {
-			srp->done = 2;
-			write_unlock_irq(&sfp->rq_list_lock);
-			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
-			return (result < 0) ? result : 0;
-		}
-		srp->orphan = 1;
-		write_unlock_irq(&sfp->rq_list_lock);
-		return result;	/* -ERESTARTSYS because signal hit process */
-	case SG_SET_TIMEOUT:
-		result = get_user(val, ip);
-		if (result)
-			return result;
-		if (val < 0)
-			return -EIO;
-		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
-			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
-				    INT_MAX);
-		sfp->timeout_user = val;
-		sfp->timeout = mult_frac(val, HZ, USER_HZ);
-
-		return 0;
-	case SG_GET_TIMEOUT:	/* N.B. User receives timeout as return value */
-				/* strange ..., for backward compatibility */
-		return sfp->timeout_user;
-	case SG_SET_FORCE_LOW_DMA:
-		/*
-		 * N.B. This ioctl never worked properly, but failed to
-		 * return an error value. So returning '0' to keep compability
-		 * with legacy applications.
-		 */
-		return 0;
-	case SG_GET_LOW_DMA:
-		return put_user((int) sdp->device->host->unchecked_isa_dma, ip);
+		return sg_ctl_sg_io(filp, sdp, sfp, p);
 	case SG_GET_SCSI_ID:
-		if (!access_ok(p, sizeof (sg_scsi_id_t)))
-			return -EFAULT;
-		else {
-			sg_scsi_id_t __user *sg_idp = p;
-
-			if (atomic_read(&sdp->detaching))
-				return -ENODEV;
-			__put_user((int) sdp->device->host->host_no,
-				   &sg_idp->host_no);
-			__put_user((int) sdp->device->channel,
-				   &sg_idp->channel);
-			__put_user((int) sdp->device->id, &sg_idp->scsi_id);
-			__put_user((int) sdp->device->lun, &sg_idp->lun);
-			__put_user((int) sdp->device->type, &sg_idp->scsi_type);
-			__put_user((short) sdp->device->host->cmd_per_lun,
-				   &sg_idp->h_cmd_per_lun);
-			__put_user((short) sdp->device->queue_depth,
-				   &sg_idp->d_queue_depth);
-			__put_user(0, &sg_idp->unused[0]);
-			__put_user(0, &sg_idp->unused[1]);
-			return 0;
-		}
+		return sg_ctl_scsi_id(sdev, sdp, p);
 	case SG_SET_FORCE_PACK_ID:
+		SG_LOG(3, sdp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->force_packid = val ? 1 : 0;
+		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
-	case SG_GET_PACK_ID:
-		if (!access_ok(ip, sizeof (int)))
-			return -EFAULT;
-		read_lock_irqsave(&sfp->rq_list_lock, iflags);
-		list_for_each_entry(srp, &sfp->rq_list, entry) {
-			if ((1 == srp->done) && (!srp->sg_io_owned)) {
-				read_unlock_irqrestore(&sfp->rq_list_lock,
-						       iflags);
-				__put_user(srp->header.pack_id, ip);
-				return 0;
+	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
+		rcu_read_lock();
+		val = -1;
+		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+			if (SG_RS_AWAIT_READ(srp) &&
+			    !test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+				val = srp->pack_id;
+				break;
 			}
 		}
-		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-		__put_user(-1, ip);
-		return 0;
+		rcu_read_unlock();
+		SG_LOG(3, sdp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
+		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
-		read_lock_irqsave(&sfp->rq_list_lock, iflags);
-		val = 0;
-		list_for_each_entry(srp, &sfp->rq_list, entry) {
-			if ((1 == srp->done) && (!srp->sg_io_owned))
-				++val;
-		}
-		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+		/* SG_GET_NUM_WAITING + num_inflight == SG_SEIRV_SUBMITTED */
+		val = atomic_read(&sfp->waiting);
+		SG_LOG(3, sdp, "%s:    SG_GET_NUM_WAITING=%d\n", __func__,
+		       val);
 		return put_user(val, ip);
 	case SG_GET_SG_TABLESIZE:
-		return put_user(sdp->sg_tablesize, ip);
+		SG_LOG(3, sdp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__,
+		       sdp->max_sgat_elems);
+		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
-		result = get_user(val, ip);
-		if (result)
-			return result;
-                if (val < 0)
-                        return -EINVAL;
-		val = min_t(int, val,
-			    max_sectors_bytes(sdp->device->request_queue));
 		mutex_lock(&sfp->f_mutex);
-		if (val != sfp->reserve.bufflen) {
-			if (sfp->mmap_called ||
-			    sfp->res_in_use) {
-				mutex_unlock(&sfp->f_mutex);
-				return -EBUSY;
+		result = get_user(val, ip);
+		if (!result) {
+			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
+				result = sg_set_reserved_sz(sfp, val);
+			} else {
+				SG_LOG(3, sdp, "%s: invalid size\n", __func__);
+				result = -EINVAL;
 			}
-
-			sg_remove_scat(sfp, &sfp->reserve);
-			sg_build_reserve(sfp, val);
 		}
 		mutex_unlock(&sfp->f_mutex);
-		return 0;
+		return result;
 	case SG_GET_RESERVED_SIZE:
-		val = min_t(int, sfp->reserve.bufflen,
-			    max_sectors_bytes(sdp->device->request_queue));
-		return put_user(val, ip);
+		mutex_lock(&sfp->f_mutex);
+		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
+			    sdp->max_sgat_sz);
+		SG_LOG(3, sdp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
+		       __func__, val);
+		result = put_user(val, ip);
+		mutex_unlock(&sfp->f_mutex);
+		return result;
 	case SG_SET_COMMAND_Q:
+		SG_LOG(3, sdp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->cmd_q = val ? 1 : 0;
+		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_COMMAND_Q:
-		return put_user((int) sfp->cmd_q, ip);
+		SG_LOG(3, sdp, "%s:    SG_GET_COMMAND_Q\n", __func__);
+		return put_user(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm), ip);
 	case SG_SET_KEEP_ORPHAN:
+		SG_LOG(3, sdp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->keep_orphan = val;
+		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
-		return put_user((int) sfp->keep_orphan, ip);
-	case SG_NEXT_CMD_LEN:
+		SG_LOG(3, sdp, "%s:    SG_GET_KEEP_ORPHAN\n", __func__);
+		return put_user(test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm),
+				ip);
+	case SG_GET_VERSION_NUM:
+		SG_LOG(3, sdp, "%s:    SG_GET_VERSION_NUM\n", __func__);
+		return put_user(sg_version_num, ip);
+	case SG_GET_REQUEST_TABLE:
+		return sg_ctl_req_tbl(sfp, p);
+	case SG_SCSI_RESET:
+		SG_LOG(3, sdp, "%s:    SG_SCSI_RESET\n", __func__);
+		check_detach = true;
+		break;
+	case SG_SET_TIMEOUT:
+		SG_LOG(3, sdp, "%s:    SG_SET_TIMEOUT\n", __func__);
+		result = get_user(val, ip);
+		if (result)
+			return result;
+		if (val < 0)
+			return -EIO;
+		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
+			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
+				    INT_MAX);
+		sfp->timeout_user = val;
+		sfp->timeout = mult_frac(val, HZ, USER_HZ);
+		return 0;
+	case SG_GET_TIMEOUT:    /* N.B. User receives timeout as return value */
+				/* strange ..., for backward compatibility */
+		SG_LOG(3, sdp, "%s:    SG_GET_TIMEOUT\n", __func__);
+		return sfp->timeout_user;
+	case SG_SET_FORCE_LOW_DMA:
+		/*
+		 * N.B. This ioctl never worked properly, but failed to
+		 * return an error value. So returning '0' to keep
+		 * compatibility with legacy applications.
+		 */
+		SG_LOG(3, sdp, "%s:    SG_SET_FORCE_LOW_DMA\n", __func__);
+		return 0;
+	case SG_GET_LOW_DMA:
+		SG_LOG(3, sdp, "%s:    SG_GET_LOW_DMA\n", __func__);
+		return put_user((int)sdev->host->unchecked_isa_dma, ip);
+	case SG_NEXT_CMD_LEN:   /* active only in v2 interface */
+		SG_LOG(3, sdp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
 		if (val > SG_MAX_CDB_SIZE)
 			return -ENOMEM;
-		sfp->next_cmd_len = (val > 0) ? val : 0;
+		mutex_lock(&sfp->f_mutex);
+		sfp->next_cmd_len = max_t(int, val, 0);
+		mutex_unlock(&sfp->f_mutex);
 		return 0;
-	case SG_GET_VERSION_NUM:
-		return put_user(sg_version_num, ip);
 	case SG_GET_ACCESS_COUNT:
+		SG_LOG(3, sdp, "%s:    SG_GET_ACCESS_COUNT\n", __func__);
 		/* faked - we don't have a real access count anymore */
-		val = (sdp->device ? 1 : 0);
+		val = (sdev ? 1 : 0);
 		return put_user(val, ip);
-	case SG_GET_REQUEST_TABLE:
-		if (!access_ok(p, SZ_SG_REQ_INFO * SG_MAX_QUEUE))
-			return -EFAULT;
-		else {
-			sg_req_info_t *rinfo;
-
-			rinfo = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
-					GFP_KERNEL);
-			if (!rinfo)
-				return -ENOMEM;
-			read_lock_irqsave(&sfp->rq_list_lock, iflags);
-			sg_fill_request_table(sfp, rinfo);
-			read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-			result = __copy_to_user(p, rinfo,
-						SZ_SG_REQ_INFO * SG_MAX_QUEUE);
-			result = result ? -EFAULT : 0;
-			kfree(rinfo);
-			return result;
-		}
 	case SG_EMULATED_HOST:
-		if (atomic_read(&sdp->detaching))
+		SG_LOG(3, sdp, "%s:    SG_EMULATED_HOST\n", __func__);
+		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
-		return put_user(sdp->device->host->hostt->emulated, ip);
+		return put_user(sdev->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
-		if (atomic_read(&sdp->detaching))
+		SG_LOG(3, sdp, "%s:    SCSI_IOCTL_SEND_COMMAND\n", __func__);
+		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
-		return sg_scsi_ioctl(sdp->device->request_queue, NULL, filp->f_mode, p);
+		return sg_scsi_ioctl(sdev->request_queue, NULL,
+				     filp->f_mode, p);
 	case SG_SET_DEBUG:
+		SG_LOG(3, sdp, "%s:    SG_SET_DEBUG\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sdp->sgdebug = (char) val;
+		sdp->sgdebug = (u8)val;
 		return 0;
 	case BLKSECTGET:
-		return put_user(max_sectors_bytes(sdp->device->request_queue),
-				ip);
+		SG_LOG(3, sdp, "%s:    BLKSECTGET\n", __func__);
+		return put_user(max_sectors_bytes(sdev->request_queue), ip);
 	case BLKTRACESETUP:
-		return blk_trace_setup(sdp->device->request_queue,
+		SG_LOG(3, sdp, "%s:    BLKTRACESETUP\n", __func__);
+		return blk_trace_setup(sdev->request_queue,
 				       sdp->disk->disk_name,
 				       MKDEV(SCSI_GENERIC_MAJOR, sdp->index),
 				       NULL, p);
 	case BLKTRACESTART:
-		return blk_trace_startstop(sdp->device->request_queue, 1);
+		SG_LOG(3, sdp, "%s:    BLKTRACESTART\n", __func__);
+		return blk_trace_startstop(sdev->request_queue, 1);
 	case BLKTRACESTOP:
-		return blk_trace_startstop(sdp->device->request_queue, 0);
+		SG_LOG(3, sdp, "%s:    BLKTRACESTOP\n", __func__);
+		return blk_trace_startstop(sdev->request_queue, 0);
 	case BLKTRACETEARDOWN:
-		return blk_trace_remove(sdp->device->request_queue);
+		SG_LOG(3, sdp, "%s:    BLKTRACETEARDOWN\n", __func__);
+		return blk_trace_remove(sdev->request_queue);
 	case SCSI_IOCTL_GET_IDLUN:
+		SG_LOG(3, sdp, "%s:    SCSI_IOCTL_GET_IDLUN %s\n", __func__,
+		       pmlp);
+		check_detach = true;
+		break;
 	case SCSI_IOCTL_GET_BUS_NUMBER:
+		SG_LOG(3, sdp, "%s:    SCSI_IOCTL_GET_BUS_NUMBER%s\n",
+		       __func__, pmlp);
+		check_detach = true;
+		break;
 	case SCSI_IOCTL_PROBE_HOST:
+		SG_LOG(3, sdp, "%s:    SCSI_IOCTL_PROBE_HOST%s\n", __func__,
+		       pmlp);
+		check_detach = true;
+		break;
 	case SG_GET_TRANSFORM:
-	case SG_SCSI_RESET:
-		if (atomic_read(&sdp->detaching))
-			return -ENODEV;
+		SG_LOG(3, sdp, "%s:    SG_GET_TRANSFORM%s\n", __func__, pmlp);
+		check_detach = true;
+		break;
+	case SG_SET_TRANSFORM:
+		SG_LOG(3, sdp, "%s:    SG_SET_TRANSFORM%s\n", __func__, pmlp);
+		check_detach = true;
 		break;
 	default:
+		SG_LOG(3, sdp, "%s:    unrecognized ioctl [0x%x]%s\n",
+		       __func__, cmd_in, pmlp);
 		if (read_only)
-			return -EPERM;	/* don't know so take safe approach */
+			return -EPERM;  /* don't know, so take safer approach */
 		break;
 	}
 
-	result = scsi_ioctl_block_when_processing_errors(sdp->device,
-			cmd_in, filp->f_flags & O_NDELAY);
+	if (check_detach) {
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+	}
+	result = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NDELAY));
 	if (result)
 		return result;
-	return scsi_ioctl(sdp->device, cmd_in, p);
+	/* ioctl that reach here are forwarded to the mid-level */
+	return scsi_ioctl(sdev, cmd_in, p);
 }
 
 #if IS_ENABLED(CONFIG_COMPAT)
@@ -1232,43 +1723,36 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 }
 #endif
 
+/*
+ * Implements the poll(2) system call for this driver. Returns various EPOLL*
+ * flags OR-ed together.
+ */
 static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
 	__poll_t p_res = 0;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	struct sg_request *srp;
-	int count = 0;
-	unsigned long iflags;
 
 	sfp = filp->private_data;
-	if (!sfp)
-		return EPOLLERR;
 	sdp = sfp->parentdp;
 	if (!sdp)
 		return EPOLLERR;
 	poll_wait(filp, &sfp->read_wait, wait);
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(srp, &sfp->rq_list, entry) {
-		/* if any read waiting, flag it */
-		if ((p_res == 0) && (srp->done == 1) && (!srp->sg_io_owned))
-			p_res = EPOLLIN | EPOLLRDNORM;
-		++count;
-	}
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	if (atomic_read(&sfp->waiting) > 0)
+		p_res = EPOLLIN | EPOLLRDNORM;
 
-	if (atomic_read(&sdp->detaching))
+	if (unlikely(atomic_read(&sdp->detaching)))
 		p_res |= EPOLLHUP;
-	else if (!sfp->cmd_q) {
-		if (0 == count)
-			p_res |= EPOLLOUT | EPOLLWRNORM;
-	} else if (count < SG_MAX_QUEUE)
+	else if (likely(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm)))
+		p_res |= EPOLLOUT | EPOLLWRNORM;
+	else if (atomic_read(&sfp->submitted) == 0)
 		p_res |= EPOLLOUT | EPOLLWRNORM;
 	SG_LOG(3, sdp, "%s: p_res=0x%x\n", __func__, (__force u32)p_res);
 	return p_res;
 }
 
+/* Entry point for fasync() related to fcntl(SET_FL(flags | O_ASYNC)) call */
 static int
 sg_fasync(int fd, struct file *filp, int mode)
 {
@@ -1283,15 +1767,17 @@ sg_fasync(int fd, struct file *filp, int mode)
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
+/* Note: the error return: VM_FAULT_SIGBUS causes a "bus error" */
 static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
+	int k, length;
+	unsigned long offset, len, sa, iflags;
 	struct vm_area_struct *vma = vmf->vma;
+	struct sg_scatter_hold *rsv_schp;
+	struct sg_request *srp;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	unsigned long offset, len, sa;
-	struct sg_scatter_hold *rsv_schp;
-	int k, length;
 	const char *nbp = "==NULL, bad";
 
 	if (!vma) {
@@ -1304,26 +1790,45 @@ sg_vma_fault(struct vm_fault *vmf)
 		goto out_err;
 	}
 	sdp = sfp->parentdp;
-	rsv_schp = &sfp->reserve;
+	if (sdp && unlikely(atomic_read(&sdp->detaching))) {
+		SG_LOG(1, sdp, "%s: device detaching\n", __func__);
+		goto out_err;
+	}
+	srp = sfp->rsv_srp;
+	if (!srp) {
+		SG_LOG(1, sdp, "%s: srp%s\n", __func__, nbp);
+		goto out_err;
+	}
+	spin_lock_irqsave(&srp->req_lck, iflags);
+	rsv_schp = &srp->sgat_h;
 	offset = vmf->pgoff << PAGE_SHIFT;
-	if (offset >= rsv_schp->bufflen)
-		return VM_FAULT_SIGBUS;
+	if (offset >= (unsigned int)rsv_schp->buflen) {
+		SG_LOG(1, sdp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
+		       offset);
+		goto out_err_unlock;
+	}
 	sa = vma->vm_start;
 	SG_LOG(3, sdp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
 		len = vma->vm_end - sa;
-		len = (len < length) ? len : length;
+		len = min_t(int, len, (int)length);
 		if (offset < len) {
-			struct page *page = nth_page(rsv_schp->pages[k],
-						     offset >> PAGE_SHIFT);
-			get_page(page);	/* increment page count */
+			struct page *page;
+			struct page *pp;
+
+			pp = rsv_schp->pages[k];
+			spin_unlock_irqrestore(&srp->req_lck, iflags);
+			page = nth_page(pp, offset >> PAGE_SHIFT);
+			get_page(page); /* increment page count */
 			vmf->page = page;
 			return 0; /* success */
 		}
 		sa += len;
 		offset -= len;
 	}
+out_err_unlock:
+	spin_unlock_irqrestore(&srp->req_lck, iflags);
 out_err:
 	return VM_FAULT_SIGBUS;
 }
@@ -1332,14 +1837,16 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
 	.fault = sg_vma_fault,
 };
 
+/* Entry point for mmap(2) system call */
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
-	struct sg_fd *sfp;
-	unsigned long req_sz, len, sa;
-	struct sg_scatter_hold *rsv_schp;
 	int k, length;
 	int ret = 0;
+	unsigned long req_sz, len, sa, iflags;
+	struct sg_scatter_hold *rsv_schp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 
 	if (!filp || !vma)
 		return -ENXIO;
@@ -1352,141 +1859,207 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	SG_LOG(3, sfp->parentdp, "%s: vm_start=%p, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
 	if (vma->vm_pgoff)
-		return -EINVAL;	/* want no offset */
-	rsv_schp = &sfp->reserve;
+		return -EINVAL; /* only an offset of 0 accepted */
+	/* Check reserve request is inactive and has large enough buffer */
 	mutex_lock(&sfp->f_mutex);
-	if (req_sz > rsv_schp->bufflen) {
-		ret = -ENOMEM;	/* cannot map more than reserved buffer */
+	srp = sfp->rsv_srp;
+	spin_lock_irqsave(&srp->req_lck, iflags);
+	if (SG_RS_ACTIVE(srp)) {
+		ret = -EBUSY;
+		goto out;
+	}
+	rsv_schp = &srp->sgat_h;
+	if (req_sz > (unsigned long)rsv_schp->buflen) {
+		ret = -ENOMEM;
 		goto out;
 	}
-
 	sa = vma->vm_start;
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
 		len = vma->vm_end - sa;
-		len = (len < length) ? len : length;
+		len = min_t(unsigned long, len, (unsigned long)length);
 		sa += len;
 	}
 
-	sfp->mmap_called = 1;
+	set_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm);
 	vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
 	vma->vm_private_data = sfp;
 	vma->vm_ops = &sg_mmap_vm_ops;
 out:
+	spin_unlock_irqrestore(&srp->req_lck, iflags);
 	mutex_unlock(&sfp->f_mutex);
 	return ret;
 }
 
+/*
+ * This user context function is called from sg_rq_end_io() in 2 situations.
+ * The first case is when a slave was in SWAIT state and sg_rq_end_io() has
+ * just been called on the corresponding master request.
+ * The second case is an orphaned request that needs to be cleaned up (e.g.
+ * when control C is typed while an ioctl(SG_IO) is active).
+ */
 static void
 sg_rq_end_io_usercontext(struct work_struct *work)
 {
-	struct sg_request *srp = container_of(work, struct sg_request, ew.work);
-	struct sg_fd *sfp = srp->parentfp;
+	struct sg_request *srp = container_of(work, struct sg_request,
+					      ew_orph.work);
+	struct sg_fd *sfp;
 
-	sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
+	if (!srp) {
+		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
+		return;
+	}
+	sfp = srp->parentfp;
+	if (!sfp) {
+		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
+		return;
+	}
+	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
+	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
+		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
+		sg_deact_request(sfp, srp);
+	}
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
+static void
+sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
+{
+	int driver_stat;
+	u32 rq_res = srp->rq_result;
+	struct scsi_request *scsi_rp = scsi_req(srp->rq);
+	u8 *sbp = scsi_rp ? scsi_rp->sense : NULL;
+
+	if (!sbp)
+		return;
+	driver_stat = driver_byte(rq_res);
+	if (driver_stat & DRIVER_SENSE) {
+		struct scsi_sense_hdr ssh;
+
+		if (scsi_normalize_sense(sbp, sense_len, &ssh)) {
+			if (!scsi_sense_is_deferred(&ssh)) {
+				if (ssh.sense_key == UNIT_ATTENTION) {
+					if (sdp->device->removable)
+						sdp->device->changed = 1;
+				}
+			}
+		}
+	}
+	if (sdp->sgdebug > 0) {
+		int scsi_stat = rq_res & 0xff;
+
+		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
+		    scsi_stat == SAM_STAT_COMMAND_TERMINATED)
+			__scsi_print_sense(sdp->device, __func__, sbp,
+					   sense_len);
+	}
+}
+
 /*
- * This function is a "bottom half" handler that is called by the mid
- * level when a command is completed (or has failed).
+ * This "bottom half" (soft interrupt) handler is called by the mid-level
+ * when a request has completed or failed. This callback is registered in a
+ * blk_execute_rq_nowait() call in the sg_common_write(). For ioctl(SG_IO)
+ * (sync) usage, sg_ctl_sg_io() waits to be woken up by this callback.
  */
 static void
 sg_rq_end_io(struct request *rq, blk_status_t status)
 {
+	enum sg_rq_state rqq_state = SG_RS_AWAIT_RD;
+	int a_resid, slen;
 	struct sg_request *srp = rq->end_io_data;
-	struct scsi_request *req = scsi_req(rq);
+	struct scsi_request *scsi_rp = scsi_req(rq);
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	unsigned long iflags;
-	unsigned int ms;
-	char *sense;
-	int result, resid, done = 1;
 
-	if (WARN_ON(srp->done != 0))
+	if (!scsi_rp) {
+		WARN_ONCE("%s: scsi_req(rq) unexpectedly NULL\n", __func__);
 		return;
-
-	sfp = srp->parentfp;
-	if (WARN_ON(sfp == NULL))
+	}
+	if (!srp) {
+		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
 		return;
-
+	}
+	/* Expect 0 --> 1 transition, otherwise processed elsewhere */
+	if (unlikely(test_and_set_bit(SG_FRQ_BLK_PUT_REQ, srp->frq_bm))) {
+		pr_info("%s: srp=%p already completed\n", __func__, srp);
+		return;
+	}
+	if (WARN_ON(atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)) {
+		pr_warn("%s: bad rq_st=%d\n", __func__,
+			atomic_read(&srp->rq_st));
+		goto early_err;
+	}
+	sfp = srp->parentfp;
+	if (unlikely(!sfp)) {
+		WARN_ONCE(1, "%s: sfp unexpectedly NULL", __func__);
+		goto early_err;
+	}
 	sdp = sfp->parentdp;
 	if (unlikely(atomic_read(&sdp->detaching)))
 		pr_info("%s: device detaching\n", __func__);
 
-	sense = req->sense;
-	result = req->result;
-	resid = req->resid_len;
-
-	srp->header.resid = resid;
-	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__,
-	       srp->header.pack_id, result);
-	ms = jiffies_to_msecs(jiffies);
-	srp->header.duration = (ms > srp->header.duration) ?
-				(ms - srp->header.duration) : 0;
-	if (0 != result) {
-		struct scsi_sense_hdr sshdr;
-
-		srp->header.status = 0xff & result;
-		srp->header.masked_status = status_byte(result);
-		srp->header.msg_status = msg_byte(result);
-		srp->header.host_status = host_byte(result);
-		srp->header.driver_status = driver_byte(result);
-		if ((sdp->sgdebug > 0) &&
-		    ((CHECK_CONDITION == srp->header.masked_status) ||
-		     (COMMAND_TERMINATED == srp->header.masked_status)))
-			__scsi_print_sense(sdp->device, __func__, sense,
-					   SCSI_SENSE_BUFFERSIZE);
-
-		/* Following if statement is a patch supplied by Eric Youngdale */
-		if (driver_byte(result) != 0
-		    && scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, &sshdr)
-		    && !scsi_sense_is_deferred(&sshdr)
-		    && sshdr.sense_key == UNIT_ATTENTION
-		    && sdp->device->removable) {
-			/* Detected possible disc change. Set the bit - this */
-			/* may be used if there are filesystems using this device */
-			sdp->device->changed = 1;
+	srp->rq_result = scsi_rp->result;
+	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
+	a_resid = scsi_rp->resid_len;
+
+	if (a_resid)
+		srp->in_resid = a_resid;
+
+	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
+	       srp->rq_result);
+	srp->duration = sg_calc_rq_dur(srp);
+	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
+		sg_check_sense(sdp, srp, slen);
+	if (slen > 0) {
+		if (scsi_rp->sense) {
+			srp->sense_bp = kzalloc(SCSI_SENSE_BUFFERSIZE,
+						GFP_ATOMIC);
+			if (srp->sense_bp)
+				memcpy(srp->sense_bp, scsi_rp->sense, slen);
+		} else {
+			pr_warn("%s: scsi_request::sense==NULL\n", __func__);
+			slen = 0;
 		}
 	}
-
-	if (req->sense_len)
-		memcpy(srp->sense_b, req->sense, SCSI_SENSE_BUFFERSIZE);
-
-	/* Rely on write phase to clean out srp status values, so no "else" */
-
+	srp->sense_len = slen;
+	if (unlikely(test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))) {
+		spin_lock(&srp->req_lck);
+		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
+			clear_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+		} else {
+			rqq_state = SG_RS_BUSY;
+			set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
+		}
+		spin_unlock(&srp->req_lck);
+	}
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+		atomic_inc(&sfp->waiting);
+	if (unlikely(sg_rstate_chg(srp, SG_RS_INFLIGHT, rqq_state)))
+		pr_warn("%s: can't set rq_st\n", __func__);
 	/*
-	 * Free the request as soon as it is complete so that its resources
-	 * can be reused without waiting for userspace to read() the
-	 * result.  But keep the associated bio (if any) around until
-	 * blk_rq_unmap_user() can be called from user context.
+	 * Free the mid-level resources apart from the bio (if any). The bio's
+	 * blk_rq_unmap_user() can be called later from user context.
 	 */
 	srp->rq = NULL;
-	scsi_req_free_cmd(scsi_req(rq));
+	scsi_req_free_cmd(scsi_rp);
 	blk_put_request(rq);
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	if (unlikely(srp->orphan)) {
-		if (sfp->keep_orphan)
-			srp->sg_io_owned = 0;
-		else
-			done = 0;
-	}
-	srp->done = done;
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-
-	if (likely(done)) {
-		/* Now wake up any sg_read() that is waiting for this
-		 * packet.
-		 */
+	if (likely(rqq_state == SG_RS_AWAIT_RD)) {
+		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
-	} else {
-		INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
-		schedule_work(&srp->ew.work);
+	} else {        /* clean up orphaned request that aren't being kept */
+		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
+		schedule_work(&srp->ew_orph.work);
 	}
+	return;
+
+early_err:
+	srp->rq = NULL;
+	if (scsi_rp)
+		scsi_req_free_cmd(scsi_rp);
+	blk_put_request(rq);
 }
 
 static const struct file_operations sg_fops = {
@@ -1507,16 +2080,16 @@ static const struct file_operations sg_fops = {
 
 static struct class *sg_sysfs_class;
 
-static int sg_sysfs_valid = 0;
+static bool sg_sysfs_valid;
 
+/* Returns valid pointer to sg_device or negated errno twisted by ERR_PTR */
 static struct sg_device *
 sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 {
-	struct request_queue *q = scsidp->request_queue;
 	struct sg_device *sdp;
-	unsigned long iflags;
 	int error;
 	u32 k;
+	unsigned long iflags;
 
 	sdp = kzalloc(sizeof(struct sg_device), GFP_KERNEL);
 	if (!sdp)
@@ -1534,7 +2107,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 			error = -ENODEV;
 		} else {
 			sdev_printk(KERN_WARNING, scsidp,
-				    "%s: idr alloc sg_device failure: %d\n",
+				"%s: idr allocation sg_device failure: %d\n",
 				    __func__, error);
 		}
 		goto out_unlock;
@@ -1551,8 +2124,8 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	INIT_LIST_HEAD(&sdp->sfds);
 	init_waitqueue_head(&sdp->open_wait);
 	atomic_set(&sdp->detaching, 0);
-	rwlock_init(&sdp->sfd_lock);
-	sdp->sg_tablesize = queue_max_segments(q);
+	rwlock_init(&sdp->sfd_llock);
+	sg_calc_sgat_param(sdp);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -1685,13 +2258,13 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 
 	SG_LOG(3, sdp, "%s: 0x%p\n", __func__, sdp);
 
-	read_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_for_each_entry(sfp, &sdp->sfds, sfd_siblings) {
+	read_lock_irqsave(&sdp->sfd_llock, iflags);
+	list_for_each_entry(sfp, &sdp->sfds, sfd_entry) {
 		wake_up_interruptible_all(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	wake_up_interruptible_all(&sdp->open_wait);
-	read_unlock_irqrestore(&sdp->sfd_lock, iflags);
+	read_unlock_irqrestore(&sdp->sfd_llock, iflags);
 
 	sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic");
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
@@ -1722,32 +2295,39 @@ init_sg(void)
 {
 	int rc;
 
-	if (scatter_elem_sz < PAGE_SIZE) {
+	/* check scatter_elem_sz module parameter, change if inappropriate */
+	if (scatter_elem_sz < (int)PAGE_SIZE)
 		scatter_elem_sz = PAGE_SIZE;
-		scatter_elem_sz_prev = scatter_elem_sz;
-	}
+	else if (scatter_elem_sz != (1 << ilog2(scatter_elem_sz)))
+		scatter_elem_sz = 1 << ilog2(scatter_elem_sz);
+	/* scatter_elem_sz rounded down to power of 2, or PAGE_SIZE */
 	if (def_reserved_size >= 0)
 		sg_big_buff = def_reserved_size;
 	else
 		def_reserved_size = sg_big_buff;
 
-	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), 
+	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				    SG_MAX_DEVS, "sg");
 	if (rc)
 		return rc;
-        sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
-        if ( IS_ERR(sg_sysfs_class) ) {
+
+	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
+		"sg device ", SCSI_GENERIC_MAJOR, SG_VERSION_STR,
+		sg_version_date);
+	sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
+	if (IS_ERR(sg_sysfs_class)) {
 		rc = PTR_ERR(sg_sysfs_class);
-		goto err_out;
-        }
-	sg_sysfs_valid = 1;
+		goto err_out_unreg;
+	}
+	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
 	if (0 == rc) {
 		sg_proc_init();
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
-err_out:
+
+err_out_unreg:
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
 	return rc;
 }
@@ -1767,40 +2347,72 @@ exit_sg(void)
 		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
 	class_destroy(sg_sysfs_class);
-	sg_sysfs_valid = 0;
+	sg_sysfs_valid = false;
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				 SG_MAX_DEVS);
 	idr_destroy(&sg_index_idr);
 }
 
+static inline bool
+sg_chk_dio_allowed(struct sg_device *sdp, struct sg_fd *sfp,
+		   struct sg_request *srp, int iov_count, int dir)
+{
+	if (sg_allow_dio && (srp->rq_flags & SG_FLAG_DIRECT_IO)) {
+		if (dir != SG_DXFER_UNKNOWN && !iov_count) {
+			if (!sdp->device->host->unchecked_isa_dma)
+				return true;
+		}
+	}
+	return false;
+}
+
+static void
+sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
+		struct rq_map_data *mdp)
+{
+	memset(mdp, 0, sizeof(*mdp));
+	mdp->pages = schp->pages;
+	mdp->page_order = schp->page_order;
+	mdp->nr_entries = schp->num_sgat;
+	mdp->offset = 0;
+	mdp->null_mapped = !up_valid;
+}
+
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd)
+sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 {
-	int res;
+	bool reserved, us_xfer;
+	int res = 0;
+	int dxfer_len = 0;
+	int r0w = READ;
+	unsigned int iov_count = 0;
+	void __user *up;
 	struct request *rq;
-	struct scsi_request *req;
-	struct sg_device *sdp;
+	struct scsi_request *scsi_rp;
 	struct sg_fd *sfp = srp->parentfp;
-	sg_io_hdr_t *hp = &srp->header;
-	int dxfer_len = (int) hp->dxfer_len;
-	int dxfer_dir = hp->dxfer_direction;
-	unsigned int iov_count = hp->iovec_count;
-	struct sg_scatter_hold *req_schp = &srp->data;
-	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
-	struct request_queue *q = sfp->parentdp->device->request_queue;
-	struct rq_map_data *md, map_data;
-	int r0w = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+	struct sg_device *sdp;
+	struct sg_scatter_hold *req_schp;
+	struct request_queue *q;
+	struct rq_map_data *md = (void *)srp; /* want any non-NULL value */
 	u8 *long_cmdp = NULL;
+	__maybe_unused const char *cp = "";
+	struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
+	struct rq_map_data map_data;
 
 	sdp = sfp->parentdp;
-	if (hp->cmd_len > BLK_MAX_CDB) {
-		long_cmdp = kzalloc(hp->cmd_len, GFP_KERNEL);
+	if (cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
+		long_cmdp = kzalloc(cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
 		SG_LOG(5, sdp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
+	up = sh3p->dxferp;
+	dxfer_len = (int)sh3p->dxfer_len;
+	iov_count = sh3p->iovec_count;
+	r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
 	SG_LOG(4, sdp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
 	       (r0w ? "OUT" : "IN"));
+	q = sdp->device->request_queue;
 
 	/*
 	 * NOTE
@@ -1813,234 +2425,247 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	 * do not want to use BLK_MQ_REQ_NOWAIT here because userspace might
 	 * not expect an EWOULDBLOCK from this condition.
 	 */
-	rq = blk_get_request(q, hp->dxfer_direction == SG_DXFER_TO_DEV ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, 0);
-	if (IS_ERR(rq)) {
+	rq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
+	if (unlikely(IS_ERR(rq))) {
 		kfree(long_cmdp);
 		return PTR_ERR(rq);
 	}
-	req = scsi_req(rq);
-
-	if (hp->cmd_len > BLK_MAX_CDB)
-		req->cmd = long_cmdp;
-	memcpy(req->cmd, cmd, hp->cmd_len);
-	req->cmd_len = hp->cmd_len;
-
+	/* current sg_request protected by SG_RS_BUSY state */
+	scsi_rp = scsi_req(rq);
 	srp->rq = rq;
-	rq->end_io_data = srp;
-	req->retries = SG_DEFAULT_RETRIES;
-
-	if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
-		return 0;
 
-	if (sg_allow_dio && hp->flags & SG_FLAG_DIRECT_IO &&
-	    dxfer_dir != SG_DXFER_UNKNOWN && !iov_count &&
-	    !sfp->parentdp->device->host->unchecked_isa_dma &&
-	    blk_rq_aligned(q, (unsigned long)hp->dxferp, dxfer_len))
+	if (cmd_len > BLK_MAX_CDB)
+		scsi_rp->cmd = long_cmdp;
+	memcpy(scsi_rp->cmd, cmd, cmd_len);
+	scsi_rp->cmd_len = cmd_len;
+	us_xfer = !(srp->rq_flags & SG_FLAG_NO_DXFER);
+	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
+	reserved = (sfp->rsv_srp == srp);
+	rq->end_io_data = srp;
+	scsi_rp->retries = SG_DEFAULT_RETRIES;
+	req_schp = &srp->sgat_h;
+
+	if (dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE) {
+		SG_LOG(4, sdp, "%s: no data xfer [0x%p]\n", __func__, srp);
+		set_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+		goto fini;	/* path of reqs with no din nor dout */
+	} else if (sg_chk_dio_allowed(sdp, sfp, srp, iov_count, dxfer_dir) &&
+		   blk_rq_aligned(q, (unsigned long)up, dxfer_len)) {
+		set_bit(SG_FRQ_DIO_IN_USE, srp->frq_bm);
+		srp->rq_info |= SG_INFO_DIRECT_IO;
 		md = NULL;
-	else
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+			cp = "direct_io, ";
+	} else {	/* normal IO and failed conditions for dio path */
 		md = &map_data;
+	}
 
-	if (md) {
-		mutex_lock(&sfp->f_mutex);
-		if (dxfer_len <= rsv_schp->bufflen &&
-		    !sfp->res_in_use) {
-			sfp->res_in_use = 1;
-			sg_link_reserve(sfp, srp, dxfer_len);
-		} else if (hp->flags & SG_FLAG_MMAP_IO) {
-			res = -EBUSY; /* sfp->res_in_use == 1 */
-			if (dxfer_len > rsv_schp->bufflen)
-				res = -ENOMEM;
-			mutex_unlock(&sfp->f_mutex);
-			return res;
-		} else {
-			res = sg_build_indirect(req_schp, sfp, dxfer_len);
-			if (res) {
-				mutex_unlock(&sfp->f_mutex);
-				return res;
-			}
+	if (likely(md)) {	/* normal, "indirect" IO */
+		if (unlikely((srp->rq_flags & SG_FLAG_MMAP_IO))) {
+			/* mmap IO must use and fit in reserve request */
+			if (!reserved || dxfer_len > req_schp->buflen)
+				res = reserved ? -ENOMEM : -EBUSY;
+		} else if (req_schp->buflen == 0) {
+			int up_sz = max_t(int, dxfer_len, sfp->sgat_elem_sz);
+
+			res = sg_mk_sgat(srp, sfp, up_sz);
 		}
-		mutex_unlock(&sfp->f_mutex);
+		if (res)
+			goto fini;
 
-		md->pages = req_schp->pages;
-		md->page_order = req_schp->page_order;
-		md->nr_entries = req_schp->k_use_sg;
-		md->offset = 0;
-		md->null_mapped = hp->dxferp ? 0 : 1;
-		if (dxfer_dir == SG_DXFER_TO_FROM_DEV)
-			md->from_user = 1;
-		else
-			md->from_user = 0;
+		sg_set_map_data(req_schp, !!up, md);
+		md->from_user = (dxfer_dir == SG_DXFER_TO_FROM_DEV);
 	}
 
-	if (iov_count) {
+	if (unlikely(iov_count)) {
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-		res = import_iovec(r0w, hp->dxferp, iov_count, 0, &iov, &i);
+		res = import_iovec(r0w, up, iov_count, 0, &iov, &i);
 		if (res < 0)
-			return res;
+			goto fini;
 
-		iov_iter_truncate(&i, hp->dxfer_len);
+		iov_iter_truncate(&i, dxfer_len);
 		if (!iov_iter_count(&i)) {
 			kfree(iov);
-			return -EINVAL;
+			res = -EINVAL;
+			goto fini;
 		}
 
-		res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
+		if (us_xfer)
+			res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
 		kfree(iov);
-	} else
-		res = blk_rq_map_user(q, rq, md, hp->dxferp,
-				      hp->dxfer_len, GFP_ATOMIC);
-
-	if (!res) {
-		srp->bio = rq->bio;
-
-		if (!md) {
-			req_schp->dio_in_use = 1;
-			hp->info |= SG_INFO_DIRECT_IO;
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+			cp = "iov_count > 0";
+	} else if (us_xfer) { /* setup for transfer data to/from user space */
+		res = blk_rq_map_user(q, rq, md, up, dxfer_len, GFP_ATOMIC);
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
+			SG_LOG(1, sdp, "%s: blk_rq_map_user() res=%d\n",
+			       __func__, res);
+	}
+fini:
+	if (likely(res == 0)) {
+		res = sg_rstate_chg(srp, SG_RS_BUSY, SG_RS_INFLIGHT);
+		if (likely(res == 0))
+			srp->bio = rq->bio;
+	}
+	if (unlikely(res && rq)) {		/* failure, free up resources */
+		scsi_req_free_cmd(scsi_rp);
+		if (likely(!test_and_set_bit(SG_FRQ_BLK_PUT_REQ,
+					     srp->frq_bm))) {
+			srp->rq = NULL;
+			blk_put_request(rq);
 		}
 	}
+	SG_LOG((res ? 1 : 4), sdp, "%s: %s res=%d [0x%p]\n", __func__, cp,
+	       res, srp);
 	return res;
 }
 
-static int
-sg_finish_rem_req(struct sg_request *srp)
+/*
+ * Clean up mid-level and block layer resources of finished request. Sometimes
+ * blk_rq_unmap_user() returns -4 (-EINTR) and this is why: "If we're in a
+ * workqueue, the request is orphaned, so don't copy into a random user
+ * address space, just free and return -EINTR so user space doesn't expect
+ * any data." [block/bio.c]
+ */
+static void
+sg_finish_scsi_blk_rq(struct sg_request *srp)
 {
-	int ret = 0;
-
+	int ret;
 	struct sg_fd *sfp = srp->parentfp;
-	struct sg_scatter_hold *req_schp = &srp->data;
 
 	SG_LOG(4, sfp->parentdp, "%s: srp=0x%p%s\n", __func__, srp,
-	       (srp->res_used) ? " rsv" : "");
-	if (srp->bio)
-		ret = blk_rq_unmap_user(srp->bio);
-
-	if (srp->rq) {
-		scsi_req_free_cmd(scsi_req(srp->rq));
-		blk_put_request(srp->rq);
+	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+		atomic_dec(&sfp->submitted);
+		atomic_dec(&sfp->waiting);
+	}
+	if (srp->bio) {
+		bool us_xfer = !test_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+
+		if (us_xfer) {
+			ret = blk_rq_unmap_user(srp->bio);
+			if (ret) {	/* -EINTR (-4) can be ignored */
+				SG_LOG(6, sfp->parentdp,
+				       "%s: blk_rq_unmap_user() --> %d\n",
+				       __func__, ret);
+			}
+		}
+		srp->bio = NULL;
 	}
+	/* In worst case READ data returned to user space by this point */
 
-	if (srp->res_used)
-		sg_unlink_reserve(sfp, srp);
-	else
-		sg_remove_scat(sfp, req_schp);
-
-	return ret;
-}
-
-static int
-sg_build_sgat(struct sg_scatter_hold *schp, const struct sg_fd *sfp,
-	      int tablesize)
-{
-	int sg_bufflen = tablesize * sizeof(struct page *);
-	gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
+	/* Expect blk_put_request(rq) already called in sg_rq_end_io() */
+	if (unlikely(!test_and_set_bit(SG_FRQ_BLK_PUT_REQ, srp->frq_bm))) {
+		struct request *rq = srp->rq;
 
-	schp->pages = kzalloc(sg_bufflen, gfp_flags);
-	if (!schp->pages)
-		return -ENOMEM;
-	schp->sglist_len = sg_bufflen;
-	return tablesize;	/* number of scat_gath elements allocated */
+		if (rq) {	/* blk_get_request() may have failed */
+			if (scsi_req(rq))
+				scsi_req_free_cmd(scsi_req(rq));
+			srp->rq = NULL;
+			blk_put_request(rq);
+		}
+	}
 }
 
 static int
-sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
-		  int buff_size)
+sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 {
-	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
-	int sg_tablesize = sfp->parentdp->sg_tablesize;
-	int blk_size = buff_size, order;
-	gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
+	int j, k, rem_sz, order, align_sz;
+	int m_size = minlen;
+	int rup_sz = 0;
+	int mx_sgat_elems = sfp->parentdp->max_sgat_elems;
+	u32 elem_sz;
+	const size_t struct_page_sz = sizeof(struct page *);
+	gfp_t mask_ap = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
+	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
+	struct sg_scatter_hold *schp = &srp->sgat_h;
 
-	if (blk_size < 0)
+	if (unlikely(m_size < 0))
 		return -EFAULT;
-	if (0 == blk_size)
-		++blk_size;	/* don't know why */
-	/* round request up to next highest SG_SECTOR_SZ byte boundary */
-	blk_size = ALIGN(blk_size, SG_SECTOR_SZ);
-	SG_LOG(4, sfp->parentdp, "%s: buff_size=%d, blk_size=%d\n",
-	       __func__, buff_size, blk_size);
-
-	/* N.B. ret_sz carried into this block ... */
-	mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
-	if (mx_sc_elems < 0)
-		return mx_sc_elems;	/* most likely -ENOMEM */
-
-	num = scatter_elem_sz;
-	if (unlikely(num != scatter_elem_sz_prev)) {
-		if (num < PAGE_SIZE) {
-			scatter_elem_sz = PAGE_SIZE;
-			scatter_elem_sz_prev = PAGE_SIZE;
-		} else
-			scatter_elem_sz_prev = num;
-	}
-
-	if (sdp->device->host->unchecked_isa_dma)
-		gfp_mask |= GFP_DMA;
-
-	order = get_order(num);
-retry:
-	ret_sz = 1 << (PAGE_SHIFT + order);
-
-	for (k = 0, rem_sz = blk_size; rem_sz > 0 && k < mx_sc_elems;
-	     k++, rem_sz -= ret_sz) {
-
-		num = (rem_sz > scatter_elem_sz_prev) ?
-			scatter_elem_sz_prev : rem_sz;
+	if (unlikely(m_size == 0))
+		++m_size;	/* don't remember why */
+	/* round request up to next highest SG_DEF_SECTOR_SZ byte boundary */
+	align_sz = ALIGN(m_size, SG_DEF_SECTOR_SZ);
+
+	schp->pages = kcalloc(mx_sgat_elems, struct_page_sz, mask_kz);
+	SG_LOG(4, sdp, "%s: minlen=%d, align_sz=%d [sz=%zu, 0x%p ++]\n",
+	       __func__, minlen, align_sz, mx_sgat_elems * struct_page_sz,
+	       schp->pages);
+	if (unlikely(!schp->pages))
+		return -ENOMEM;
 
-		schp->pages[k] = alloc_pages(gfp_mask, order);
+	elem_sz = sfp->sgat_elem_sz;    /* power of 2 and >= PAGE_SIZE */
+	if (sdp && unlikely(sdp->device->host->unchecked_isa_dma))
+		mask_ap |= GFP_DMA;
+	/* PAGE_SIZE == (1 << PAGE_SHIFT) == (2 ** PAGE_SHIFT) */
+	order = get_order(elem_sz); /* --> ilog2(1+((elem_sz-1)/PAGE_SIZE)) */
+again:
+	rup_sz = 1 << (PAGE_SHIFT + order);	/* round up size */
+
+	for (k = 0, rem_sz = align_sz; rem_sz > 0 && k < mx_sgat_elems;
+	     ++k, rem_sz -= rup_sz) {
+		schp->pages[k] = alloc_pages(mask_ap, order);
 		if (!schp->pages[k])
-			goto out;
-
-		if (num == scatter_elem_sz_prev) {
-			if (unlikely(ret_sz > scatter_elem_sz_prev)) {
-				scatter_elem_sz = ret_sz;
-				scatter_elem_sz_prev = ret_sz;
-			}
-		}
-		SG_LOG(5, sfp->parentdp, "%s: k=%d, num=%d, ret_sz=%d\n",
-		       __func__, k, num, ret_sz);
-	}		/* end of for loop */
-
+			goto err_out;
+		SG_LOG(5, sdp, "%s: k=%d, rup_sz=%d [0x%p ++]\n", __func__, k,
+		       rup_sz, schp->pages[k]);
+	}
 	schp->page_order = order;
-	schp->k_use_sg = k;
-	SG_LOG(5, sfp->parentdp, "%s: k_use_sg=%d, order=%d\n", __func__,
-	       k, order);
-	schp->bufflen = blk_size;
-	if (rem_sz > 0)	/* must have failed */
+	schp->num_sgat = k;
+	SG_LOG(5, sdp, "%s: num_sgat=%d, order=%d\n", __func__, k, order);
+	if (unlikely(rem_sz > 0))       /* must have failed */
 		return -ENOMEM;
+	schp->buflen = align_sz;
 	return 0;
-out:
-	for (i = 0; i < k; i++)
-		__free_pages(schp->pages[i], order);
+err_out:
+	for (j = 0; j < k; ++j)
+		__free_pages(schp->pages[j], order);
 
 	if (--order >= 0)
-		goto retry;
-
+		goto again;
+	kfree(schp->pages);
 	return -ENOMEM;
 }
 
 static void
-sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
-{
-	SG_LOG(4, sfp->parentdp, "%s: num_sgat=%d\n", __func__,
-	       schp->k_use_sg);
-	if (schp->pages && schp->sglist_len > 0) {
-		if (!schp->dio_in_use) {
-			int k;
-
-			for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
-				SG_LOG(5, sfp->parentdp,
-				       "%s: pg[%d]=0x%p --\n", __func__, k,
-				       schp->pages[k]);
-				__free_pages(schp->pages[k], schp->page_order);
-			}
-			kfree(schp->pages);
-		}
-	}
-	memset(schp, 0, sizeof (*schp));
+sg_remove_sgat_helper(struct sg_device *sdp, struct sg_scatter_hold *schp)
+{
+	int k;
+	void *p;
+
+	if (!schp->pages)
+		return;
+	for (k = 0; k < schp->num_sgat; ++k) {
+		p = schp->pages[k];
+		SG_LOG(5, sdp, "%s: pg[%d]=0x%p --\n", __func__, k, p);
+		if (unlikely(!p))
+			continue;
+		__free_pages(p, schp->page_order);
+	}
+	SG_LOG(5, sdp, "%s: pg_order=%u, free pgs=0x%p --\n", __func__,
+	       schp->page_order, schp->pages);
+	kfree(schp->pages);
+}
+
+/* Remove the data (possibly a sgat list) held by srp, not srp itself */
+static void
+sg_remove_sgat(struct sg_request *srp)
+{
+	struct sg_scatter_hold *schp = &srp->sgat_h; /* care: remove own data */
+	struct sg_fd *sfp = srp->parentfp;
+	struct sg_device *sdp;
+
+	sdp = (sfp ? sfp->parentdp : NULL);
+	SG_LOG(4, sdp, "%s: num_sgat=%d%s\n", __func__, schp->num_sgat,
+	       ((srp->parentfp ? (sfp->rsv_srp == srp) : false) ?
+		" [rsv]" : ""));
+	if (!test_bit(SG_FRQ_DIO_IN_USE, srp->frq_bm))
+		sg_remove_sgat_helper(sdp, schp);
+
+	memset(schp, 0, sizeof(*schp));         /* zeros buflen and dlen */
 }
 
 /*
@@ -2049,241 +2674,514 @@ sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
  * appended to given struct sg_header object.
  */
 static int
-sg_rd_append(struct sg_request *srp, char __user *outp, int num_xfer)
+sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
-	struct sg_scatter_hold *schp = &srp->data;
-	int k, num;
+	int k, num, res;
+	struct page *pgp;
+	struct sg_scatter_hold *schp = &srp->sgat_h;
 
 	SG_LOG(4, srp->parentfp->parentdp, "%s: num_xfer=%d\n", __func__,
 	       num_xfer);
-	if ((!outp) || (num_xfer <= 0))
-		return 0;
+	if (unlikely(!outp || num_xfer <= 0))
+		return (num_xfer == 0 && outp) ? 0 : -EINVAL;
 
 	num = 1 << (PAGE_SHIFT + schp->page_order);
-	for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
+	for (k = 0, res = 0; k < schp->num_sgat; ++k) {
+		pgp = schp->pages[k];
+		if (unlikely(!pgp)) {
+			res = -ENXIO;
+			break;
+		}
 		if (num > num_xfer) {
-			if (__copy_to_user(outp, page_address(schp->pages[k]),
-					   num_xfer))
-				return -EFAULT;
+			if (__copy_to_user(outp, page_address(pgp), num_xfer))
+				res = -EFAULT;
 			break;
 		} else {
-			if (__copy_to_user(outp, page_address(schp->pages[k]),
-					   num))
-				return -EFAULT;
+			if (__copy_to_user(outp, page_address(pgp), num)) {
+				res = -EFAULT;
+				break;
+			}
 			num_xfer -= num;
 			if (num_xfer <= 0)
 				break;
 			outp += num;
 		}
 	}
-
-	return 0;
+	return res;
 }
 
-static void
-sg_build_reserve(struct sg_fd *sfp, int req_size)
+/*
+ * If there are multiple requests outstanding, the speed of this function is
+ * important. SG_PACK_ID_WILDCARD is -1 and that case is typically
+ * the fast path. This function is only used in the non-blocking cases.
+ * Returns pointer to (first) matching sg_request or NULL. If found,
+ * sg_request state is moved from SG_RS_AWAIT_RD to SG_RS_BUSY.
+ */
+static struct sg_request *
+sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 {
-	struct sg_scatter_hold *schp = &sfp->reserve;
-
-	SG_LOG(3, sfp ? sfp->parentdp : NULL, "%s: buflen=%d\n", __func__,
-	       req_size);
-	do {
-		if (req_size < PAGE_SIZE)
-			req_size = PAGE_SIZE;
-		if (0 == sg_build_indirect(schp, sfp, req_size))
-			return;
-		else
-			sg_remove_scat(sfp, schp);
-		req_size >>= 1;	/* divide by 2 */
-	} while (req_size > (PAGE_SIZE / 2));
+	__maybe_unused bool is_bad_st = false;
+	__maybe_unused enum sg_rq_state bad_sr_st;
+	bool search_for_1 = (pack_id != SG_PACK_ID_WILDCARD);
+	enum sg_rq_state sr_st;
+	int res;
+	struct sg_request *srp = NULL;
+
+	rcu_read_lock();
+	if (unlikely(search_for_1)) {
+		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+				continue;
+			if (srp->pack_id != pack_id)
+				continue;
+			sr_st = atomic_read(&srp->rq_st);
+			switch (sr_st) {
+			case SG_RS_AWAIT_RD:
+				res = sg_rstate_chg(srp, sr_st, SG_RS_BUSY);
+				if (likely(res == 0))
+					goto good;
+				/* else another caller got it, move on */
+				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+					is_bad_st = true;
+					bad_sr_st = atomic_read(&srp->rq_st);
+				}
+				break;
+			case SG_RS_INFLIGHT:
+				break;
+			default:
+				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+					is_bad_st = true;
+					bad_sr_st = sr_st;
+				}
+				break;
+			}
+			break;	/* problem if same id on > 1 requests */
+		}
+	} else {        /* search for any request is more likely */
+		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+				continue;
+			sr_st = atomic_read(&srp->rq_st);
+			if (sr_st == SG_RS_AWAIT_RD) {
+				if (likely(sg_rstate_chg(srp, sr_st,
+							 SG_RS_BUSY) == 0))
+					goto good;
+			}
+		}
+	}
+	/* here if one of above loops does _not_ find a match */
+	rcu_read_unlock();
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+		if (search_for_1) {
+			struct sg_device *sdp = sfp->parentdp;
+			const char *cptp = "pack_id=";
+
+			if (is_bad_st)
+				SG_LOG(1, sdp, "%s: %s%d wrong state: %s\n",
+				       __func__, cptp, pack_id,
+				       sg_rq_st_str(bad_sr_st, true));
+			else
+				SG_LOG(6, sdp, "%s: %s%d not awaiting read\n",
+				       __func__, cptp, pack_id);
+		}
+	}
+	return NULL;
+good:
+	rcu_read_unlock();
+	if (search_for_1) {
+		SG_LOG(6, sfp->parentdp, "%s: %s%d found [srp=0x%p]\n",
+		       __func__, "pack_id=", pack_id, srp);
+	}
+	return srp;
 }
 
-static void
-sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
+/*
+ * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
+ * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
+ * Note that basic initialization is done but srp is not added to either sfp
+ * list. On error returns twisted negated errno value (not NULL).
+ */
+static struct sg_request *
+sg_mk_srp(struct sg_fd *sfp, bool first)
 {
-	struct sg_scatter_hold *req_schp = &srp->data;
-	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
-	int k, num, rem;
-
-	srp->res_used = 1;
-	SG_LOG(4, sfp->parentdp, "%s: size=%d\n", __func__, size);
-	rem = size;
-
-	num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg; k++) {
-		if (rem <= num) {
-			req_schp->k_use_sg = k + 1;
-			req_schp->sglist_len = rsv_schp->sglist_len;
-			req_schp->pages = rsv_schp->pages;
+	struct sg_request *srp;
+	int gfp =  __GFP_NOWARN;
 
-			req_schp->bufflen = size;
-			req_schp->page_order = rsv_schp->page_order;
-			break;
-		} else
-			rem -= num;
+	if (first)      /* prepared to wait if none already outstanding */
+		srp = kzalloc(sizeof(*srp), gfp | GFP_KERNEL);
+	else
+		srp = kzalloc(sizeof(*srp), gfp | GFP_ATOMIC);
+	if (srp) {
+		spin_lock_init(&srp->req_lck);
+		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		srp->parentfp = sfp;
+		return srp;
+	} else {
+		return ERR_PTR(-ENOMEM);
 	}
+}
 
-	if (k >= rsv_schp->k_use_sg)
-		SG_LOG(1, sfp->parentdp, "%s: BAD size\n", __func__);
+static struct sg_request *
+sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
+{
+	int res;
+	struct sg_request *n_srp = sg_mk_srp(sfp, first);
+
+	if (IS_ERR(n_srp))
+		return n_srp;
+	if (db_len > 0) {
+		res = sg_mk_sgat(n_srp, sfp, db_len);
+		if (res) {
+			kfree(n_srp);
+			return ERR_PTR(res);
+		}
+	}
+	return n_srp;
 }
 
-static void
-sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
+/*
+ * Irrespective of the given reserve request size, the minimum size requested
+ * will be PAGE_SIZE (often 4096 bytes). Returns a pointer to reserve object or
+ * a negated errno value twisted by ERR_PTR() macro. The actual number of bytes
+ * allocated (maybe less than buflen) is in srp->sgat_h.buflen . Note that this
+ * function is only called in contexts where locking is not required.
+ */
+static struct sg_request *
+sg_build_reserve(struct sg_fd *sfp, int buflen)
 {
-	struct sg_scatter_hold *req_schp = &srp->data;
+	bool go_out = false;
+	int res;
+	struct sg_request *srp;
 
-	SG_LOG(4, srp->parentfp->parentdp, "%s: req->k_use_sg=%d\n", __func__,
-	       (int)req_schp->k_use_sg);
-	req_schp->k_use_sg = 0;
-	req_schp->bufflen = 0;
-	req_schp->pages = NULL;
-	req_schp->page_order = 0;
-	req_schp->sglist_len = 0;
-	srp->res_used = 0;
-	/* Called without mutex lock to avoid deadlock */
-	sfp->res_in_use = 0;
+	SG_LOG(3, sfp ? sfp->parentdp : NULL, "%s: buflen=%d\n", __func__,
+	       buflen);
+	srp = sg_mk_srp(sfp, list_empty(&sfp->rq_fl));
+	if (IS_ERR(srp))
+		return srp;
+	sfp->rsv_srp = srp;
+	do {
+		if (buflen < (int)PAGE_SIZE) {
+			buflen = PAGE_SIZE;
+			go_out = true;
+		}
+		res = sg_mk_sgat(srp, sfp, buflen);
+		if (res == 0) {
+			SG_LOG(4, sfp ? sfp->parentdp : NULL,
+			       "%s: final buflen=%d, srp=0x%p ++\n", __func__,
+			       buflen, srp);
+			return srp;
+		}
+		if (go_out)
+			return ERR_PTR(res);
+		/* failed so remove, halve buflen, try again */
+		sg_remove_sgat(srp);
+		buflen >>= 1;   /* divide by 2 */
+	} while (true);
 }
 
-/* always adds to end of list */
+/*
+ * Adds an active request (soon to carry a SCSI command) to the current file
+ * descriptor by creating a new one or re-using a request from the free
+ * list (fl). If successful returns a valid pointer in SG_RS_BUSY state. On
+ * failure returns a negated errno value twisted by ERR_PTR() macro.
+ */
 static struct sg_request *
-sg_add_request(struct sg_fd *sfp)
+sg_add_request(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 {
-	int k;
+	bool act_empty = false;
+	bool mk_new_srp = true;
 	unsigned long iflags;
-	struct sg_request *rp = sfp->req_arr;
+	enum sg_rq_state sr_st;
+	struct sg_request *r_srp = NULL;	/* request to return */
+	struct sg_request *rsv_srp;	/* current fd's reserve request */
+	__maybe_unused struct sg_device *sdp;
+	__maybe_unused const char *cp;
+
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	sdp = sfp->parentdp;
+	rsv_srp = sfp->rsv_srp;
+	cp = "";
+	/*
+	 * Check the free list (fl) for candidates. Pick zero data length
+	 * requests from the back of the fl, the rest from the front.
+	 */
+	if (list_empty(&sfp->rq_fl)) {
+		act_empty = true;
+	} else if (dxfr_len < 1) {  /* 0 data length requests at back of fl */
+		list_for_each_entry_reverse(r_srp, &sfp->rq_fl, fl_entry) {
+			sr_st = atomic_read(&r_srp->rq_st);
+			if (sr_st == SG_RS_INACTIVE) {
+				if (likely(sg_rstate_chg(r_srp, sr_st,
+							 SG_RS_BUSY) == 0)) {
+					cp = "from back of fl";
+					mk_new_srp = false;
+					break;
+				}
+			}
+		}
+	} else { /*     find request with large enough dlen */
+		list_for_each_entry(r_srp, &sfp->rq_fl, fl_entry) {
+			sr_st = atomic_read(&r_srp->rq_st);
+			if (sr_st == SG_RS_INACTIVE &&
+			    r_srp->sgat_h.buflen >= dxfr_len) {
+				if (likely(sg_rstate_chg(r_srp, sr_st,
+							 SG_RS_BUSY) == 0)) {
+					cp = "from front of fl";
+					mk_new_srp = false;
+					break;
+				} /* else other got it, move on */
+			}
+		}
+	}
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	if (!list_empty(&sfp->rq_list)) {
-		if (!sfp->cmd_q)
-			goto out_unlock;
+	if (mk_new_srp) {	/* Need new sg_request object */
+		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 
-		for (k = 0; k < SG_MAX_QUEUE; ++k, ++rp) {
-			if (!rp->parentfp)
-				break;
+		r_srp = NULL;
+		if (!allow_cmd_q && !list_empty(&sfp->rq_list)) {
+			r_srp = ERR_PTR(-EDOM);
+			SG_LOG(6, sdp, "%s: trying 2nd req but cmd_q=false\n",
+			       __func__);
 		}
-		if (k >= SG_MAX_QUEUE)
-			goto out_unlock;
-	}
-	memset(rp, 0, sizeof(struct sg_request));
-	rp->parentfp = sfp;
-	rp->header.duration = jiffies_to_msecs(jiffies);
-	list_add_tail(&rp->entry, &sfp->rq_list);
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return rp;
-out_unlock:
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
+		spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+		if (IS_ERR(r_srp))        /* NULL is not an ERR here */
+			goto err_no_lock;
+		/* releasing rq_list_lock because next line could take time */
+		r_srp = sg_mk_srp_sgat(sfp, act_empty, dxfr_len);
+		if (IS_ERR(r_srp))
+			goto err_no_lock;
+		cp = "new";
+		SG_LOG(4, sdp, "%s: mk_new_srp=0x%p ++\n", __func__, r_srp);
+		atomic_set(&r_srp->rq_st, SG_RS_BUSY);
+	} else {	/* otherwise found srp is on fl, remove from fl */
+		list_del_rcu(&r_srp->fl_entry);
+		r_srp->in_resid = 0;
+		r_srp->rq_info = 0;
+		r_srp->sense_len = 0;
+	}
+	if (!mk_new_srp)
+		spin_lock(&r_srp->req_lck);
+	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
+	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
+	if (mk_new_srp)
+		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	list_add_tail_rcu(&r_srp->rq_entry, &sfp->rq_list);
+	if (!mk_new_srp)
+		spin_unlock(&r_srp->req_lck);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+err_no_lock:
+	if (IS_ERR(r_srp))
+		SG_LOG(1, sdp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
+	if (!IS_ERR(r_srp))
+		SG_LOG(4, sdp, "%s: %s %sr_srp=0x%p\n", __func__, cp,
+		       ((r_srp == rsv_srp) ? "[rsv] " : ""), r_srp);
+	return r_srp;
 }
 
-/* Return of 1 for found; 0 for not found */
-static int
-sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
+/*
+ * Moves a completed sg_request object to the free list and sets it to
+ * SG_RS_INACTIVE which makes it available for re-use. Requests with no data
+ * associated are appended to the tail of the free list while other requests
+ * are prepended to the head of the free list.
+ */
+static void
+sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
+	bool on_fl = false;
+	int dlen, buflen;
 	unsigned long iflags;
-	int res = 0;
+	struct sg_request *t_srp;
+	struct sg_scatter_hold *schp;
+	const char *cp = "head";
 
-	if (!sfp || !srp || list_empty(&sfp->rq_list))
-		return res;
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	if (!list_empty(&srp->entry)) {
-		list_del(&srp->entry);
-		srp->parentfp = NULL;
-		res = 1;
+	if (WARN_ON(!sfp || !srp))
+		return;
+	schp = &srp->sgat_h;	/* make sure it is own data buffer */
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	atomic_set(&srp->rq_st, SG_RS_BUSY);
+	list_del_rcu(&srp->rq_entry);
+	/*
+	 * N.B. sg_request object is not de-allocated (freed). The contents
+	 * of the rq_list and rq_fl lists are de-allocated (freed) when
+	 * the owning file descriptor is closed. The free list acts as a LIFO
+	 * for same size (buflen) sg_request objects. This can improve the
+	 * chance of a cache hit when the sg_request object is re-used.
+	 */
+	dlen = schp->dlen;
+	buflen = schp->buflen;
+	/* want ascending free list by dlen, but dlen==0 at end */
+	if (dlen > 0) {
+		list_for_each_entry(t_srp, &sfp->rq_fl, fl_entry) {
+			if (buflen <= t_srp->sgat_h.buflen ||
+			    t_srp->sgat_h.buflen == 0) {
+				/* add srp _before_ t_srp on fl */
+				list_add_tail_rcu(&srp->fl_entry,
+						  &t_srp->fl_entry);
+				on_fl = true;
+				cp = "ascending pos";
+				break;
+			}
+		}
 	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return res;
+	if (!on_fl) { /* either empty list, dlen=0 or buflen highest */
+		list_add_tail_rcu(&srp->fl_entry, &sfp->rq_fl);
+		cp = "tail";
+	}
+	SG_LOG(5, sfp->parentdp, "%s: %ssrp=0x%p move to fl %s\n", __func__,
+	       ((sfp->rsv_srp == srp) ? "rsv " : ""), srp, cp);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
 }
 
+/* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
 static struct sg_fd *
 sg_add_sfp(struct sg_device *sdp)
 {
-	struct sg_fd *sfp;
+	bool reduced = false;
+	int rbuf_len;
 	unsigned long iflags;
-	int bufflen;
+	long err;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
-	if (!sfp)
+	if (!sfp) {
+		SG_LOG(1, sdp, "%s: sfp allocation failed\n", __func__);
 		return ERR_PTR(-ENOMEM);
-
+	}
 	init_waitqueue_head(&sfp->read_wait);
-	rwlock_init(&sfp->rq_list_lock);
+	spin_lock_init(&sfp->rq_list_lock);
 	INIT_LIST_HEAD(&sfp->rq_list);
+	INIT_LIST_HEAD(&sfp->rq_fl);
 	kref_init(&sfp->f_ref);
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
-	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
-	sfp->cmd_q = SG_DEF_COMMAND_Q;
-	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
+	/* other bits in sfp->ffd_bm[1] cleared by kzalloc() above */
+	assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
+	assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
+	assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
+	assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
+	/*
+	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
+	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
+	 * Any user provided number will be changed to be PAGE_SIZE as a
+	 * minimum, otherwise it will be rounded down (if required) to a
+	 * power of 2. So it will always be a power of 2.
+	 */
+	sfp->sgat_elem_sz = scatter_elem_sz;
 	sfp->parentdp = sdp;
-	write_lock_irqsave(&sdp->sfd_lock, iflags);
 	if (atomic_read(&sdp->detaching)) {
-		write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 		kfree(sfp);
+		SG_LOG(1, sdp, "%s: detaching\n", __func__);
 		return ERR_PTR(-ENODEV);
 	}
-	list_add_tail(&sfp->sfd_siblings, &sdp->sfds);
-	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
-	SG_LOG(3, sdp, "%s: sfp=0x%p\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
-	bufflen = min_t(int, sg_big_buff,
-			max_sectors_bytes(sdp->device->request_queue));
-	sg_build_reserve(sfp, bufflen);
-	SG_LOG(3, sdp, "%s: bufflen=%d, k_use_sg=%d\n", __func__,
-	       sfp->reserve.bufflen, sfp->reserve.k_use_sg);
-
+	rbuf_len = min_t(int, sg_big_buff, sdp->max_sgat_sz);
+	if (rbuf_len > 0) {
+		srp = sg_build_reserve(sfp, rbuf_len);
+		if (IS_ERR(srp)) {
+			kfree(sfp);
+			err = PTR_ERR(srp);
+			SG_LOG(1, sdp, "%s: build reserve err=%ld\n", __func__,
+			       -err);
+			return ERR_PTR(err);
+		}
+		if (srp->sgat_h.buflen < rbuf_len) {
+			reduced = true;
+			SG_LOG(2, sdp,
+			       "%s: reserve reduced from %d to buflen=%d\n",
+			       __func__, rbuf_len, srp->sgat_h.buflen);
+		}
+		/* will be first element so head or tail doesn't matter */
+		list_add_tail_rcu(&srp->fl_entry, &sfp->rq_fl);
+	}
+	if (!reduced) {
+		SG_LOG(4, sdp, "%s: built reserve buflen=%d\n", __func__,
+		       rbuf_len);
+	}
+	write_lock_irqsave(&sdp->sfd_llock, iflags);
+	list_add_tail(&sfp->sfd_entry, &sdp->sfds);
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
+	write_unlock_irqrestore(&sdp->sfd_llock, iflags);
+	SG_LOG(3, sdp, "%s: success, sfp=0x%p ++\n", __func__, sfp);
 	return sfp;
 }
 
+/*
+ * A successful call to sg_release() will result, at some later time, to this
+ * function being invoked. All requests associated with this file descriptor
+ * should be completed or cancelled when this function is called (due to
+ * sfp->f_ref). Also the file descriptor itself has not been accessible since
+ * it was list_del()-ed by the preceding sg_remove_sfp() call. So no locking
+ * is required. sdp should never be NULL but to make debugging more robust,
+ * this function will not blow up in that case.
+ */
 static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
-	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
-	struct sg_device *sdp = sfp->parentdp;
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
+	struct sg_device *sdp;
 	struct sg_request *srp;
-	unsigned long iflags;
-
-	/* Cleanup any responses which were never read(). */
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	while (!list_empty(&sfp->rq_list)) {
-		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
-		sg_finish_rem_req(srp);
-		list_del(&srp->entry);
-		srp->parentfp = NULL;
-	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	__maybe_unused const char *cp = " srp=0x";
 
-	if (sfp->reserve.bufflen > 0) {
-		SG_LOG(6, sdp, "%s:    bufflen=%d, k_use_sg=%d\n", __func__,
-		       (int)sfp->reserve.bufflen, (int)sfp->reserve.k_use_sg);
-		sg_remove_scat(sfp, &sfp->reserve);
+	if (!sfp) {
+		pr_warn("sg: %s: sfp is NULL\n", __func__);
+		return;
 	}
+	sdp = sfp->parentdp;
 
-	SG_LOG(6, sdp, "%s: sfp=0x%p\n", __func__, sfp);
+	/* Cleanup any responses which were never read(). */
+	while (!list_empty(&sfp->rq_list)) {
+		srp = list_last_entry(&sfp->rq_list, struct sg_request,
+				      rq_entry);
+		if (!srp)
+			continue;
+		sg_finish_scsi_blk_rq(srp);
+		list_del(&srp->rq_entry);
+		if (srp->sgat_h.buflen > 0)
+			sg_remove_sgat(srp);
+		SG_LOG(6, sdp, "%s:%s%p --\n", __func__, cp, srp);
+		kfree(srp);
+	}
+	while (!list_empty(&sfp->rq_fl)) {
+		srp = list_last_entry(&sfp->rq_fl, struct sg_request,
+				      fl_entry);
+		if (!srp)
+			continue;
+		list_del(&srp->fl_entry);
+		if (srp->sgat_h.buflen > 0)
+			sg_remove_sgat(srp);
+		SG_LOG(6, sdp, "%s: fl%s%p --\n", __func__, cp, srp);
+		kfree(srp);
+	}
+	SG_LOG(3, sdp, "%s: kfree: sfp=0x%p --\n", __func__, sfp);
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
-	kref_put(&sdp->d_ref, sg_device_destroy);
+	if (sdp) {
+		scsi_device_put(sdp->device);
+		kref_put(&sdp->d_ref, sg_device_destroy);
+	}
 	module_put(THIS_MODULE);
 }
 
 static void
 sg_remove_sfp(struct kref *kref)
 {
+	unsigned long iflags;
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
 	struct sg_device *sdp = sfp->parentdp;
-	unsigned long iflags;
 
-	write_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_del(&sfp->sfd_siblings);
-	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
+	write_lock_irqsave(&sdp->sfd_llock, iflags);
+	list_del(&sfp->sfd_entry);
+	write_unlock_irqrestore(&sdp->sfd_llock, iflags);
 
-	INIT_WORK(&sfp->ew.work, sg_remove_sfp_usercontext);
-	schedule_work(&sfp->ew.work);
+	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
+	schedule_work(&sfp->ew_fd.work);
 }
 
 static int
@@ -2326,10 +3224,37 @@ sg_get_dev(int dev)
 	return sdp;
 }
 
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)
+static const char *
+sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
+{
+	switch (rq_st) {	/* request state */
+	case SG_RS_INACTIVE:
+		return long_str ? "inactive" :  "ina";
+	case SG_RS_INFLIGHT:
+		return long_str ? "inflight" : "act";
+	case SG_RS_AWAIT_RD:
+		return long_str ? "await_read" : "rcv";
+	case SG_RS_DONE_RD:
+		return long_str ? "done_read" : "fin";
+	case SG_RS_BUSY:
+		return long_str ? "busy" : "bsy";
+	default:
+		return long_str ? "unknown" : "unk";
+	}
+}
+#else
+static const char *
+sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
+{
+	return "";
+}
+#endif
+
 #if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
-static int sg_proc_single_open_adio(struct inode *inode, struct file *file);
+static int sg_proc_single_open_adio(struct inode *inode, struct file *filp);
 static ssize_t sg_proc_write_adio(struct file *filp, const char __user *buffer,
 			          size_t count, loff_t *off);
 static const struct file_operations adio_fops = {
@@ -2341,7 +3266,7 @@ static const struct file_operations adio_fops = {
 	.release = single_release,
 };
 
-static int sg_proc_single_open_dressz(struct inode *inode, struct file *file);
+static int sg_proc_single_open_dressz(struct inode *inode, struct file *filp);
 static ssize_t sg_proc_write_dressz(struct file *filp, 
 		const char __user *buffer, size_t count, loff_t *off);
 static const struct file_operations dressz_fops = {
@@ -2374,12 +3299,12 @@ static const struct seq_operations devstrs_seq_ops = {
 	.show  = sg_proc_seq_show_devstrs,
 };
 
-static int sg_proc_seq_show_debug(struct seq_file *s, void *v);
-static const struct seq_operations debug_seq_ops = {
+static int sg_proc_seq_show_dbg(struct seq_file *s, void *v);
+static const struct seq_operations dbg_seq_ops = {
 	.start = dev_seq_start,
 	.next  = dev_seq_next,
 	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_debug,
+	.show  = sg_proc_seq_show_dbg,
 };
 
 static int
@@ -2391,13 +3316,13 @@ sg_proc_init(void)
 	if (!p)
 		return 1;
 
-	proc_create("allow_dio", S_IRUGO | S_IWUSR, p, &adio_fops);
-	proc_create_seq("debug", S_IRUGO, p, &debug_seq_ops);
-	proc_create("def_reserved_size", S_IRUGO | S_IWUSR, p, &dressz_fops);
-	proc_create_single("device_hdr", S_IRUGO, p, sg_proc_seq_show_devhdr);
-	proc_create_seq("devices", S_IRUGO, p, &dev_seq_ops);
-	proc_create_seq("device_strs", S_IRUGO, p, &devstrs_seq_ops);
-	proc_create_single("version", S_IRUGO, p, sg_proc_seq_show_version);
+	proc_create("allow_dio", 0644, p, &adio_fops);
+	proc_create_seq("debug", 0444, p, &dbg_seq_ops);
+	proc_create("def_reserved_size", 0644, p, &dressz_fops);
+	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
+	proc_create_seq("devices", 0444, p, &dev_seq_ops);
+	proc_create_seq("device_strs", 0444, p, &devstrs_seq_ops);
+	proc_create_single("version", 0444, p, sg_proc_seq_show_version);
 	return 0;
 }
 
@@ -2421,9 +3346,9 @@ sg_proc_seq_show_int(struct seq_file *s, void *v)
 }
 
 static int
-sg_proc_single_open_adio(struct inode *inode, struct file *file)
+sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 {
-	return single_open(file, sg_proc_seq_show_int, &sg_allow_dio);
+	return single_open(filp, sg_proc_seq_show_int, &sg_allow_dio);
 }
 
 static ssize_t 
@@ -2443,9 +3368,9 @@ sg_proc_write_adio(struct file *filp, const char __user *buffer,
 }
 
 static int
-sg_proc_single_open_dressz(struct inode *inode, struct file *file)
+sg_proc_single_open_dressz(struct inode *inode, struct file *filp)
 {
-	return single_open(file, sg_proc_seq_show_int, &sg_big_buff);
+	return single_open(filp, sg_proc_seq_show_int, &sg_big_buff);
 }
 
 static ssize_t 
@@ -2486,12 +3411,13 @@ sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 struct sg_proc_deviter {
 	loff_t	index;
 	size_t	max;
+	int fd_index;
 };
 
 static void *
 dev_seq_start(struct seq_file *s, loff_t *pos)
 {
-	struct sg_proc_deviter * it = kmalloc(sizeof(*it), GFP_KERNEL);
+	struct sg_proc_deviter *it = kzalloc(sizeof(*it), GFP_KERNEL);
 
 	s->private = it;
 	if (! it)
@@ -2507,7 +3433,7 @@ dev_seq_start(struct seq_file *s, loff_t *pos)
 static void *
 dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
-	struct sg_proc_deviter * it = s->private;
+	struct sg_proc_deviter *it = s->private;
 
 	*pos = ++it->index;
 	return (it->index < it->max) ? it : NULL;
@@ -2522,7 +3448,7 @@ dev_seq_stop(struct seq_file *s, void *v)
 static int
 sg_proc_seq_show_dev(struct seq_file *s, void *v)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
@@ -2549,7 +3475,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 static int
 sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
@@ -2566,111 +3492,196 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	return 0;
 }
 
-/* must be called while holding sg_index_lock */
-static void
-sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
+/* Writes debug info for one sg_request in obp buffer */
+static int
+sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
+	__must_hold(&srp->req_lck)
+{
+	bool is_v3, is_dur;
+	int n = 0;
+	u32 dur;
+	enum sg_rq_state rq_st;
+	const char *cp;
+
+	if (len < 1)
+		return 0;
+	is_v3 = (srp->s_hdr3.interface_id != '\0');
+	if (srp->parentfp->rsv_srp == srp)
+		cp = (is_v3 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
+				"     mmap>> " : "     rsv>> ";
+	else
+		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
+				"     dio>> " : "     ";
+	rq_st = atomic_read(&srp->rq_st);
+	dur = sg_get_dur(srp, &rq_st, &is_dur);
+	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
+		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
+		       srp->sgat_h.buflen, (int)srp->pack_id);
+	if (is_dur)	/* cmd/req has completed, waiting for ... */
+		n += scnprintf(obp + n, len - n, " dur=%ums", dur);
+	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
+		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%ums",
+			       to / 1000, dur);
+	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
+		       srp->sgat_h.num_sgat, srp->cmd_opcode);
+	return n;
+}
+
+/* Writes debug info for one sg fd (including its sg requests) in obp buffer */
+static int
+sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
+	__must_hold(&sfp->rq_list_lock)
 {
-	int k, new_interface, blen, usg;
+	bool first_fl;
+	int n = 0;
+	int to;
 	struct sg_request *srp;
-	struct sg_fd *fp;
-	const sg_io_hdr_t *hp;
-	const char * cp;
-	unsigned int ms;
-
-	k = 0;
-	list_for_each_entry(fp, &sdp->sfds, sfd_siblings) {
-		k++;
-		read_lock(&fp->rq_list_lock); /* irqs already disabled */
-		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
-			   "(res)sgat=%d low_dma=%d\n", k,
-			   jiffies_to_msecs(fp->timeout),
-			   fp->reserve.bufflen,
-			   (int) fp->reserve.k_use_sg,
-			   (int) sdp->device->host->unchecked_isa_dma);
-		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
-			   (int) fp->cmd_q, (int) fp->force_packid,
-			   (int) fp->keep_orphan);
-		list_for_each_entry(srp, &fp->rq_list, entry) {
-			hp = &srp->header;
-			new_interface = (hp->interface_id == '\0') ? 0 : 1;
-			if (srp->res_used) {
-				if (new_interface &&
-				    (SG_FLAG_MMAP_IO & hp->flags))
-					cp = "     mmap>> ";
-				else
-					cp = "     rb>> ";
-			} else {
-				if (SG_INFO_DIRECT_IO_MASK & hp->info)
-					cp = "     dio>> ";
-				else
-					cp = "     ";
-			}
-			seq_puts(s, cp);
-			blen = srp->data.bufflen;
-			usg = srp->data.k_use_sg;
-			seq_puts(s, srp->done ?
-				 ((1 == srp->done) ?  "rcv:" : "fin:")
-				  : "act:");
-			seq_printf(s, " id=%d blen=%d",
-				   srp->header.pack_id, blen);
-			if (srp->done)
-				seq_printf(s, " dur=%d", hp->duration);
-			else {
-				ms = jiffies_to_msecs(jiffies);
-				seq_printf(s, " t_o/elap=%d/%d",
-					(new_interface ? hp->timeout :
-						  jiffies_to_msecs(fp->timeout)),
-					(ms > hp->duration ? ms - hp->duration : 0));
-			}
-			seq_printf(s, "ms sgat=%d op=0x%02x\n", usg,
-				   (int) srp->data.cmd_opcode);
+
+	/* sgat=-1 means unavailable */
+	to = jiffies_to_msecs(fp->timeout);
+	if (to % 1000)
+		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
+	else
+		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
+	n += scnprintf(obp + n, len - n, "v_buflen=%d\n   cmd_q=%d ",
+		       fp->rsv_srp->sgat_h.buflen,
+		       (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n,
+		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
+		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
+		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
+		       fp->ffd_bm[0]);
+	n += scnprintf(obp + n, len - n, "   mmap_called=%d\n",
+		       test_bit(SG_FFD_MMAP_CALLED, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n,
+		       "   submitted=%d waiting=%d   open thr_id=%d\n",
+		       atomic_read(&fp->submitted),
+		       atomic_read(&fp->waiting), fp->tid);
+	list_for_each_entry_rcu(srp, &fp->rq_list, rq_entry) {
+		spin_lock(&srp->req_lck);
+		n += sg_proc_dbg_sreq(srp, fp->timeout, obp + n, len - n);
+		spin_unlock(&srp->req_lck);
+	}
+	if (list_empty(&fp->rq_list))
+		n += scnprintf(obp + n, len - n, "     No requests active\n");
+	first_fl = true;
+	list_for_each_entry_rcu(srp, &fp->rq_fl, fl_entry) {
+		if (first_fl) {
+			n += scnprintf(obp + n, len - n, "   Free list:\n");
+			first_fl = false;
 		}
-		if (list_empty(&fp->rq_list))
-			seq_puts(s, "     No requests active\n");
-		read_unlock(&fp->rq_list_lock);
+		spin_lock(&srp->req_lck);
+		n += sg_proc_dbg_sreq(srp, fp->timeout, obp + n, len - n);
+		spin_unlock(&srp->req_lck);
 	}
+	return n;
 }
 
+/* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
-sg_proc_seq_show_debug(struct seq_file *s, void *v)
+sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
+	__must_hold(&sdp->sfd_llock)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	struct sg_device *sdp;
+	int n = 0;
+	int my_count = 0;
+	struct scsi_device *ssdp = sdp->device;
+	struct sg_fd *fp;
+	char *disk_name;
+	int *countp;
+
+	countp = fd_counterp ? fd_counterp : &my_count;
+	disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
+	n += scnprintf(obp + n, len - n, " >>> device=%s ", disk_name);
+	n += scnprintf(obp + n, len - n, "%d:%d:%d:%llu ", ssdp->host->host_no,
+		       ssdp->channel, ssdp->id, ssdp->lun);
+	n += scnprintf(obp + n, len - n,
+		       "  max_sgat_sz,elems=2^%d,%d excl=%d open_cnt=%d\n",
+		       ilog2(sdp->max_sgat_sz), sdp->max_sgat_elems,
+		       sdp->exclude, sdp->open_cnt);
+	list_for_each_entry(fp, &sdp->sfds, sfd_entry) {
+		++*countp;
+		rcu_read_lock(); /* assume irqs disabled */
+		n += scnprintf(obp + n, len - n, "  FD(%d): ", *countp);
+		n += sg_proc_dbg_fd(fp, obp + n, len - n);
+		rcu_read_unlock();
+	}
+	return n;
+}
+
+/* Called via dbg_seq_ops once for each sg device */
+static int
+sg_proc_seq_show_dbg(struct seq_file *s, void *v)
+{
+	bool found = false;
+	bool trunc = false;
+	const int bp_len = SG_PROC_DEBUG_SZ;
+	int n = 0;
+	int k = 0;
 	unsigned long iflags;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
+	struct sg_device *sdp;
+	int *fdi_p;
+	char *bp;
+	char *disk_name;
+	char b1[128];
 
+	b1[0] = '\0';
 	if (it && (0 == it->index))
 		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n",
-			   (int)it->max, sg_big_buff);
-
+			   (int)it->max, def_reserved_size);
+	fdi_p = it ? &it->fd_index : &k;
+	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
+	if (!bp) {
+		seq_printf(s, "%s: Unable to allocate %d on heap, finish\n",
+			   __func__, bp_len);
+		return -1;
+	}
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (NULL == sdp)
 		goto skip;
-	read_lock(&sdp->sfd_lock);
+	read_lock(&sdp->sfd_llock);
 	if (!list_empty(&sdp->sfds)) {
-		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
+		found = true;
+		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
 		if (atomic_read(&sdp->detaching))
-			seq_puts(s, "detaching pending close ");
+			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
+				 disk_name, "detaching pending close\n");
 		else if (sdp->device) {
-			struct scsi_device *scsidp = sdp->device;
-
-			seq_printf(s, "%d:%d:%d:%llu   em=%d",
-				   scsidp->host->host_no,
-				   scsidp->channel, scsidp->id,
-				   scsidp->lun,
-				   scsidp->host->hostt->emulated);
+			n = sg_proc_dbg_sdev(sdp, bp, bp_len, fdi_p);
+			if (n >= bp_len - 1) {
+				trunc = true;
+				if (bp[bp_len - 2] != '\n')
+					bp[bp_len - 2] = '\n';
+			}
+		} else {
+			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
+				 disk_name, "sdp->device==NULL, skip");
 		}
-		seq_printf(s, " sg_tablesize=%d excl=%d open_cnt=%d\n",
-			   sdp->sg_tablesize, sdp->exclude, sdp->open_cnt);
-		sg_proc_debug_helper(s, sdp);
 	}
-	read_unlock(&sdp->sfd_lock);
+	read_unlock(&sdp->sfd_llock);
 skip:
 	read_unlock_irqrestore(&sg_index_lock, iflags);
+	if (found) {
+		if (n > 0) {
+			seq_puts(s, bp);
+			if (seq_has_overflowed(s))
+				goto s_ovfl;
+			if (trunc)
+				seq_printf(s, "   >> Output truncated %s\n",
+					   "due to buffer size");
+		} else if (b1[0]) {
+			seq_puts(s, b1);
+			if (seq_has_overflowed(s))
+				goto s_ovfl;
+		}
+	}
+s_ovfl:
+	kfree(bp);
 	return 0;
 }
 
-#endif				/* CONFIG_SCSI_PROC_FS (~800 lines back) */
+#endif			/* CONFIG_SCSI_PROC_FS (~600 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);

From patchwork Fri May 24 18:47:56 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960323
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9401315A6
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8511128A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7979328A22; Fri, 24 May 2019 18:48:28 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DACC28A24
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:28 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391575AbfEXSs1 (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:27 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56354 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391503AbfEXSs1 (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:27 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 40B85204199;
        Fri, 24 May 2019 20:48:26 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id EshWVel4bZfI; Fri, 24 May 2019 20:48:25 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 8B1DF204165;
        Fri, 24 May 2019 20:48:18 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 06/19] sg: sense buffer cleanup
Date: Fri, 24 May 2019 14:47:56 -0400
Message-Id: <20190524184809.25121-7-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Only a smaller percentage of SCSI commands should require a sense
buffer. Allocate as needed and delete as soon as possible.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 72ce51b3198c..a58875198c16 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -852,20 +852,21 @@ sg_copy_sense(struct sg_request *srp)
 	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
 		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
 		int mx_sb_len;
+		u8 *sbp = srp->sense_bp;
 		void __user *up;
 
+		srp->sense_bp = NULL;
 		up = (void __user *)srp->s_hdr3.sbp;
 		mx_sb_len = srp->s_hdr3.mx_sb_len;
-		if (up && mx_sb_len > 0 && srp->sense_bp) {
+		if (up && mx_sb_len > 0 && sbp) {
 			sb_len = min_t(int, sb_len, mx_sb_len);
 			/* Additional sense length field */
-			sb_len_wr = 8 + (int)srp->sense_bp[7];
+			sb_len_wr = 8 + (int)sbp[7];
 			sb_len_wr = min_t(int, sb_len, sb_len_wr);
-			if (copy_to_user(up, srp->sense_bp, sb_len_wr))
+			if (copy_to_user(up, sbp, sb_len_wr))
 				sb_len_wr = -EFAULT;
 		}
-		kfree(srp->sense_bp);
-		srp->sense_bp = NULL;
+		kfree(sbp);
 	}
 	return sb_len_wr;
 }
@@ -972,12 +973,9 @@ sg_rd_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	h2p->driver_status = driver_byte(rq_result);
 	if ((CHECK_CONDITION & status_byte(rq_result)) ||
 	    (DRIVER_SENSE & driver_byte(rq_result))) {
-		if (srp->sense_bp) {
+		if (srp->sense_bp)
 			memcpy(h2p->sense_buffer, srp->sense_bp,
 			       sizeof(h2p->sense_buffer));
-			kfree(srp->sense_bp);
-			srp->sense_bp = NULL;
-		}
 	}
 	switch (host_byte(rq_result)) {
 	/*
@@ -1013,17 +1011,22 @@ sg_rd_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
-		if (copy_to_user(buf, h2p, SZ_SG_HEADER))
-			return -EFAULT;
+		if (copy_to_user(buf, h2p, SZ_SG_HEADER)) {
+			res = -EFAULT;
+			goto fini;
+		}
 		buf += SZ_SG_HEADER;
 		if (count > h2p->reply_len)
 			count = h2p->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER))
-				return -EFAULT;
+			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
+				res = -EFAULT;
+				goto fini;
+			}
 		}
 	} else
 		res = (h2p->result == 0) ? 0 : -EIO;
+fini:
 	atomic_set(&srp->rq_st, SG_RS_DONE_RD);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
@@ -2991,6 +2994,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	bool on_fl = false;
 	int dlen, buflen;
 	unsigned long iflags;
+	u8 *sbp;
 	struct sg_request *t_srp;
 	struct sg_scatter_hold *schp;
 	const char *cp = "head";
@@ -2999,8 +3003,11 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 		return;
 	schp = &srp->sgat_h;	/* make sure it is own data buffer */
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	sbp = srp->sense_bp;
+	srp->sense_bp = NULL;
 	atomic_set(&srp->rq_st, SG_RS_BUSY);
 	list_del_rcu(&srp->rq_entry);
+	kfree(sbp);     /* maybe orphaned req, thus never read */
 	/*
 	 * N.B. sg_request object is not de-allocated (freed). The contents
 	 * of the rq_list and rq_fl lists are de-allocated (freed) when
@@ -3145,6 +3152,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		list_del(&srp->rq_entry);
 		if (srp->sgat_h.buflen > 0)
 			sg_remove_sgat(srp);
+		kfree(srp->sense_bp);   /* abnormal close: device detached */
 		SG_LOG(6, sdp, "%s:%s%p --\n", __func__, cp, srp);
 		kfree(srp);
 	}
@@ -3156,6 +3164,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		list_del(&srp->fl_entry);
 		if (srp->sgat_h.buflen > 0)
 			sg_remove_sgat(srp);
+		kfree(srp->sense_bp);
 		SG_LOG(6, sdp, "%s: fl%s%p --\n", __func__, cp, srp);
 		kfree(srp);
 	}

From patchwork Fri May 24 18:47:57 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960331
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47F0D15A6
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:35 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36AD728A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:35 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 2AD5328A25; Fri, 24 May 2019 18:48:35 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10CBA28A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:33 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391640AbfEXSsc (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:32 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56369 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391625AbfEXSsc (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:32 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 7B28C20418A;
        Fri, 24 May 2019 20:48:28 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id 1R6rc+9VaMLi; Fri, 24 May 2019 20:48:26 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id C5648204162;
        Fri, 24 May 2019 20:48:19 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 07/19] sg: add sg v4 interface support
Date: Fri, 24 May 2019 14:47:57 -0400
Message-Id: <20190524184809.25121-8-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add support for the sg v4 interface based on struct sg_io_v4 found
in include/uapi/linux/bsg.h and only previously supported by the
bsg driver. Add ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE) for
async (non-blocking) usage of the sg v4 interface. Do not accept
the v3 interface with these ioctls. Do not accept the v4
interface with this driver's existing write() and read()
system calls.

For sync (blocking) usage expand the existing ioctl(SG_IO)
to additionally accept the sg v4 interface object.

A later patch in this set adds multiple requests in a single
invocation. Multiple requests use sg v4 interface objects
exclusively.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 503 ++++++++++++++++++++++++++++++++---------
 include/uapi/scsi/sg.h |  37 ++-
 2 files changed, 429 insertions(+), 111 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index a58875198c16..3b78b8c8304b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -7,8 +7,9 @@
  *
  * Original driver (sg.c):
  *        Copyright (C) 1992 Lawrence Foard
- * Version 2 and 3 extensions to driver:
+ * Version 2, 3 and 4 extensions to driver:
  *        Copyright (C) 1998 - 2019 Douglas Gilbert
+ *
  */
 
 static int sg_version_num = 30536;	/* 2 digits for each component */
@@ -40,10 +41,12 @@ static char *sg_version_date = "20140603";
 #include <linux/atomic.h>
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
-#include <linux/cred.h> /* for sg_check_file_access() */
+#include <linux/cred.h>			/* for sg_check_file_access() */
+#include <linux/bsg.h>
 #include <linux/proc_fs.h>
 
-#include "scsi.h"
+#include <scsi/scsi.h>
+#include <scsi/scsi_eh.h>
 #include <scsi/scsi_dbg.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_driver.h>
@@ -103,6 +106,7 @@ enum sg_rq_state {
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
 
 /* Bit positions (flags) for sg_request::frq_bm bitmask follow */
+#define SG_FRQ_IS_V4I		0	/* true (set) when is v4 interface */
 #define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
 #define SG_FRQ_DIO_IN_USE	3	/* false->indirect_IO,mmap; 1->dio */
@@ -159,6 +163,15 @@ struct sg_slice_hdr3 {
 	void __user *usr_ptr;
 };
 
+struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
+	void __user *sbp;	/* derived from sg_io_v4::response */
+	u64 usr_ptr;		/* hold sg_io_v4::usr_ptr as given (u64) */
+	int out_resid;
+	s16 dir;		/* data xfer direction; SG_DXFER_*  */
+	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
+	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
+};
+
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
 	struct page **pages;	/* num_sgat element array of struct page* */
 	int buflen;		/* capacity in bytes (dlen<=buflen) */
@@ -175,7 +188,10 @@ struct sg_request {	/* active SCSI command or inactive on free list (fl) */
 	struct list_head fl_entry;	/* member of rq_fl */
 	spinlock_t req_lck;
 	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
-	struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
+	union {
+		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
+		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
+	};
 	u32 duration;		/* cmd duration in milliseconds */
 	u32 rq_flags;		/* hold user supplied flags */
 	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
@@ -237,7 +253,10 @@ struct sg_device {	/* holds the state of each scsi generic device */
 struct sg_comm_wr_t {	/* arguments to sg_common_write() */
 	int timeout;
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
-	struct sg_io_hdr *h3p;
+	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
+		struct sg_io_hdr *h3p;
+		struct sg_io_v4 *h4p;
+	};
 	u8 *cmnd;
 };
 
@@ -246,13 +265,13 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-			int dxfer_dir);
+			struct sg_io_v4 *h4p, int dxfer_dir);
 // static int sg_finish_rem_req(struct sg_request *srp);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
-static int sg_submit(struct file *filp, struct sg_fd *sfp,
-		     struct sg_io_hdr *hp, bool sync,
-		     struct sg_request **o_srp);
+static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
+			struct sg_io_hdr *hp, bool sync,
+			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_fd *sfp,
 					  struct sg_comm_wr_t *cwp);
 static int sg_rd_append(struct sg_request *srp, void __user *outp,
@@ -260,11 +279,11 @@ static int sg_rd_append(struct sg_request *srp, void __user *outp,
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int pack_id);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
 static struct sg_request *sg_add_request(struct sg_fd *sfp, int dxfr_len,
 					 struct sg_comm_wr_t *cwrp);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
-static struct sg_device *sg_get_dev(int dev);
+static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
@@ -275,10 +294,13 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 				 enum sg_rq_state want_st,
 				 enum sg_rq_state act_old_st);
 
-#define SZ_SG_HEADER ((int)sizeof(struct sg_header))    /* v1 and v2 header */
+#define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
+#define SZ_SG_IO_V4 ((int)sizeof(struct sg_io_v4))  /* v4 header (in bsg.h) */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
 
+/* There is a assert that SZ_SG_IO_V4 >= SZ_SG_IO_HDR in first function */
+
 #define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
 #define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == SG_RS_AWAIT_RD)
 
@@ -292,7 +314,9 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 
 #define SG_PROC_DEBUG_SZ 8192
 
-#if IS_ENABLED(CONFIG_SCSI_LOGGING)
+#if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
+#define SG_LOG_ACTIVE 1
+
 #define SG_LOG(depth, sdp, fmt, a...)					\
 	do {								\
 		char _b[160];						\
@@ -309,13 +333,9 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 	} while (0)
 #else
 #define SG_LOG(depth, sdp, fmt, a...)
-#endif	/* end of CONFIG_SCSI_LOGGING conditional */
+#endif	/* end of CONFIG_SCSI_LOGGING && SG_DEBUG conditional */
 
 
-#define sg_printk(prefix, sdp, fmt, a...) \
-	sdev_prefix_printk(prefix, (sdp)->device,		\
-			   (sdp)->disk->disk_name, fmt, ##a)
-
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of
  * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways
@@ -331,6 +351,10 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 static int
 sg_check_file_access(struct file *filp, const char *caller)
 {
+	/* can't put following in declarations where it belongs */
+	compiletime_assert(SZ_SG_IO_V4 >= SZ_SG_IO_HDR,
+			   "struct sg_io_v4 should be larger than sg_io_hdr");
+
 	if (filp->f_cred != current_real_cred()) {
 		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
 			caller, task_tgid_vnr(current), current->comm);
@@ -346,33 +370,34 @@ sg_check_file_access(struct file *filp, const char *caller)
 
 static int
 sg_wait_open_event(struct sg_device *sdp, bool o_excl)
+	__must_hold(&sdp->open_rel_lock)
 {
 	int retval = 0;
 
-	if (o_excl) {
+	if (unlikely(o_excl)) {
 		while (sdp->open_cnt > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
-					(atomic_read(&sdp->detaching) ||
-					 !sdp->open_cnt));
+				    (unlikely(atomic_read(&sdp->detaching)) ||
+				    !sdp->open_cnt));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (retval) /* -ERESTARTSYS */
 				return retval;
-			if (atomic_read(&sdp->detaching))
+			if (unlikely(atomic_read(&sdp->detaching)))
 				return -ENODEV;
 		}
 	} else {
-		while (sdp->exclude) {
+		while (unlikely(sdp->exclude)) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
-					(atomic_read(&sdp->detaching) ||
-					 !sdp->exclude));
+				    unlikely((atomic_read(&sdp->detaching)) ||
+					     !sdp->exclude));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (retval) /* -ERESTARTSYS */
 				return retval;
-			if (atomic_read(&sdp->detaching))
+			if (unlikely(atomic_read(&sdp->detaching)))
 				return -ENODEV;
 		}
 	}
@@ -420,14 +445,13 @@ sg_open(struct inode *inode, struct file *filp)
 	o_excl = !!(op_flags & O_EXCL);
 	non_block = !!(op_flags & O_NONBLOCK);
 	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
-		return -EPERM; /* Can't lock it with read only access */
+		return -EPERM;/* not permitted, need write access for O_EXCL */
 	sdp = sg_get_dev(min_dev);
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
-
 	SG_LOG(3, sdp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
 	       __func__, min_dev, op_flags, "device open", sdp->open_cnt,
-	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
+	       (non_block ? " O_NONBLOCK" : ""));
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
@@ -478,6 +502,7 @@ sg_open(struct inode *inode, struct file *filp)
 	}
 
 	filp->private_data = sfp;
+	sfp->tid = (current ? current->pid : -1);
 	sdp->open_cnt++;
 	mutex_unlock(&sdp->open_rel_lock);
 
@@ -500,8 +525,13 @@ sg_open(struct inode *inode, struct file *filp)
 	goto sg_put;
 }
 
-/* Release resources associated with a successful sg_open()
- * Returns 0 on success, else a negated errno value */
+/*
+ * Release resources associated with a prior, successful sg_open(). It can be
+ * seen as the (final) close() call on a sg device file descriptor in the user
+ * space. The real work releasing all resources associated with this file
+ * descriptor is done by sg_remove_sfp_usercontext() which is scheduled by
+ * sg_remove_sfp().
+ */
 static int
 sg_release(struct inode *inode, struct file *filp)
 {
@@ -586,7 +616,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 		if (__copy_from_user(h3u8p + SZ_SG_HEADER, p + SZ_SG_HEADER,
 				     SZ_SG_IO_HDR - SZ_SG_HEADER))
 			return -EFAULT;
-		res = sg_submit(filp, sfp, h3p, false, NULL);
+		res = sg_v3_submit(filp, sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
 	}
 	/* v1 and v2 interfaces processed below this point */
@@ -657,18 +687,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
 
-static inline int
-sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
-{
-	if (!list_empty(&sfp->rq_list))
-		return -EBUSY;  /* already active requests on fd */
-	if (len > sfp->rsv_srp->sgat_h.buflen)
-		return -ENOMEM; /* MMAP_IO size must fit in reserve */
-	if (rq_flags & SG_FLAG_DIRECT_IO)
-		return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
-	return 0;
-}
-
 static int
 sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 	      int len, u8 *cdbp)
@@ -691,8 +709,8 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 }
 
 static int
-sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
-	  bool sync, struct sg_request **o_srp)
+sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
+	     bool sync, struct sg_request **o_srp)
 {
 	int res, timeout;
 	unsigned long ul_timeout;
@@ -702,9 +720,12 @@ sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
-		res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
-		if (res)
-			return res;
+		if (!list_empty(&sfp->rq_list))
+			return -EBUSY;  /* already active requests on fd */
+		if (hp->dxfer_len > sfp->rsv_srp->sgat_h.buflen)
+			return -ENOMEM; /* MMAP_IO size must fit in reserve */
+		if (hp->flags & SG_FLAG_DIRECT_IO)
+			return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
 	}
 	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
@@ -726,16 +747,124 @@ sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	return 0;
 }
 
+static int
+sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
+	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
+{
+	int timeout, res;
+	unsigned long ul_timeout;
+	struct sg_request *srp;
+	struct sg_comm_wr_t cwr;
+	u8 cmnd[SG_MAX_CDB_SIZE];
+
+	if (h4p->flags & SG_FLAG_MMAP_IO) {
+		int len = 0;
+
+		if (h4p->din_xferp)
+			len = h4p->din_xfer_len;
+		else if (h4p->dout_xferp)
+			len = h4p->dout_xfer_len;
+		if (!list_empty(&sfp->rq_list))
+			return -EBUSY;  /* already active requests on fd */
+		if (len > sfp->rsv_srp->sgat_h.buflen)
+			return -ENOMEM; /* MMAP_IO size must fit in reserve */
+		if (h4p->flags & SG_FLAG_DIRECT_IO)
+			return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
+	}
+	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
+	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	ul_timeout = msecs_to_jiffies(h4p->timeout);
+	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	res = sg_fetch_cmnd(filp, sfp, cuptr64(h4p->request), h4p->request_len,
+			    cmnd);
+	if (res)
+		return res;
+	cwr.frq_bm[0] = 0;
+	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
+	cwr.h4p = h4p;
+	cwr.timeout = timeout;
+	cwr.cmnd = cmnd;
+	srp = sg_common_write(sfp, &cwr);
+	if (IS_ERR(srp))
+		return PTR_ERR(srp);
+	if (o_srp)
+		*o_srp = srp;
+	return res;
+}
+
+static int
+sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	int res;
+	u8 hdr_store[SZ_SG_IO_V4];
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
+	struct sg_device *sdp = sfp->parentdp;
+
+	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	if (res)
+		return res;
+	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	if (h4p->guard == 'Q')
+		return sg_v4_submit(filp, sfp, p, h4p, false, NULL);
+	return -EPERM;
+}
+
+static void
+sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
+{
+	bool at_head, is_v4h, sync;
+	struct sg_device *sdp = sfp->parentdp;
+
+	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
+	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	SG_LOG(3, sdp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
+	srp->start_ns = ktime_get_boot_ns();
+	srp->duration = 0;
+
+	if (!is_v4h && srp->s_hdr3.interface_id == '\0')
+		at_head = true;	/* backward compatibility: v1+v2 interfaces */
+	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+		/* cmd flags can override sfd setting */
+		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
+	else            /* this sfd is defaulting to head */
+		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
+
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
+
+	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
+	if (!sync)
+		atomic_inc(&sfp->submitted);
+	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
+			      srp->rq, (int)at_head, sg_rq_end_io);
+}
+
+static inline int
+sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
+	      enum sg_rq_state new_st)
+{
+	enum sg_rq_state act_old_st = (enum sg_rq_state)
+				atomic_cmpxchg(&srp->rq_st, old_st, new_st);
+
+	if (act_old_st == old_st)
+		return 0;	/* implies new_st --> srp->rq_st */
+	else if (IS_ENABLED(SG_LOG_ACTIVE))
+		sg_rep_rq_state_fail(srp->parentfp->parentdp, old_st, new_st,
+				     act_old_st);
+	return -EPROTOTYPE;
+}
+
 /*
  * All writes and submits converge on this function to launch the SCSI
  * command/request (via blk_execute_rq_nowait). Returns a pointer to a
  * sg_request object holding the request just issued or a negated errno
  * value twisted by ERR_PTR.
+ * N.B. pack_id placed in sg_io_v4::request_extra field.
  */
 static struct sg_request *
 sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
-	bool at_head;
 	int res = 0;
 	int dxfr_len, dir, cmd_len;
 	int pack_id = SG_PACK_ID_WILDCARD;
@@ -743,12 +872,32 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
 	struct sg_io_hdr *hi_p;
-
-	hi_p = cwrp->h3p;
-	dir = hi_p->dxfer_direction;
-	dxfr_len = hi_p->dxfer_len;
-	rq_flags = hi_p->flags;
-	pack_id = hi_p->pack_id;
+	struct sg_io_v4 *h4p;
+
+	if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm)) {
+		h4p = cwrp->h4p;
+		hi_p = NULL;
+		dxfr_len = 0;
+		dir = SG_DXFER_NONE;
+		rq_flags = h4p->flags;
+		pack_id = h4p->request_extra;
+		if (h4p->din_xfer_len && h4p->dout_xfer_len) {
+			return ERR_PTR(-EOPNOTSUPP);
+		} else if (h4p->din_xfer_len) {
+			dxfr_len = h4p->din_xfer_len;
+			dir = SG_DXFER_FROM_DEV;
+		} else if (h4p->dout_xfer_len) {
+			dxfr_len = h4p->dout_xfer_len;
+			dir = SG_DXFER_TO_DEV;
+		}
+	} else {                /* sg v3 interface so hi_p valid */
+		h4p = NULL;
+		hi_p = cwrp->h3p;
+		dir = hi_p->dxfer_direction;
+		dxfr_len = hi_p->dxfer_len;
+		rq_flags = hi_p->flags;
+		pack_id = hi_p->pack_id;
+	}
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
@@ -758,13 +907,23 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	srp->rq_flags = rq_flags;
 	srp->pack_id = pack_id;
 
-	cmd_len = hi_p->cmd_len;
-	memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	if (h4p) {
+		memset(&srp->s_hdr4, 0, sizeof(srp->s_hdr4));
+		srp->s_hdr4.usr_ptr = h4p->usr_ptr;
+		srp->s_hdr4.sbp = uptr64(h4p->response);
+		srp->s_hdr4.max_sb_len = h4p->max_response_len;
+		srp->s_hdr4.cmd_len = h4p->request_len;
+		srp->s_hdr4.dir = dir;
+		cmd_len = h4p->request_len;
+	} else {	/* v3 interface active */
+		cmd_len = hi_p->cmd_len;
+		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	}
 	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
 	SG_LOG(4, sdp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
 	       (int)cwrp->cmnd[0], cmd_len, pack_id);
 
-	res = sg_start_req(srp, cwrp->cmnd, cmd_len, dir);
+	res = sg_start_req(srp, cwrp->cmnd, cmd_len, h4p, dir);
 	if (res < 0)		/* probably out of space --> -ENOMEM */
 		goto err_out;
 	if (unlikely(atomic_read(&sdp->detaching))) {
@@ -776,19 +935,8 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 		goto err_out;
 	}
 	srp->rq->timeout = cwrp->timeout;
-	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
-	srp->start_ns = ktime_get_boot_ns();
-	srp->duration = 0;
 
-	if (srp->s_hdr3.interface_id == '\0')
-		at_head = true; /* backward compatibility: v1+v2 interfaces */
-	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
-	/* cmd flags can override sfd setting */
-		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
-	else            /* this sfd is defaulting to head */
-		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
-	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
-			      srp->rq, at_head, sg_rq_end_io);
+	sg_execute_cmd(sfp, srp);
 	return srp;
 err_out:
 	sg_finish_scsi_blk_rq(srp);
@@ -796,21 +944,6 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	return ERR_PTR(res);
 }
 
-static inline int
-sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
-	      enum sg_rq_state new_st)
-{
-	enum sg_rq_state act_old_st = (enum sg_rq_state)
-				atomic_cmpxchg(&srp->rq_st, old_st, new_st);
-
-	if (act_old_st == old_st)
-		return 0;       /* implies new_st --> srp->rq_st */
-	else if (IS_ENABLED(CONFIG_SCSI_LOGGING))
-		sg_rep_rq_state_fail(srp->parentfp->parentdp, old_st, new_st,
-				     act_old_st);
-	return -EPROTOTYPE;
-}
-
 /*
  * read(2) related functions follow. They are shown after write(2) related
  * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
@@ -841,7 +974,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
  * negated errno value.
  */
 static int
-sg_copy_sense(struct sg_request *srp)
+sg_copy_sense(struct sg_request *srp, bool v4_active)
 {
 	int sb_len_wr = 0;
 	int scsi_stat;
@@ -856,8 +989,13 @@ sg_copy_sense(struct sg_request *srp)
 		void __user *up;
 
 		srp->sense_bp = NULL;
-		up = (void __user *)srp->s_hdr3.sbp;
-		mx_sb_len = srp->s_hdr3.mx_sb_len;
+		if (v4_active) {
+			up = uptr64(srp->s_hdr4.sbp);
+			mx_sb_len = srp->s_hdr4.max_sb_len;
+		} else {
+			up = (void __user *)srp->s_hdr3.sbp;
+			mx_sb_len = srp->s_hdr3.mx_sb_len;
+		}
 		if (up && mx_sb_len > 0 && sbp) {
 			sb_len = min_t(int, sb_len, mx_sb_len);
 			/* Additional sense length field */
@@ -898,13 +1036,13 @@ sg_rep_rq_state_fail(struct sg_device *sdp, enum sg_rq_state exp_old_st,
 #endif
 
 static int
-sg_rec_v3_state(struct sg_fd *sfp, struct sg_request *srp)
+sg_rec_v3v4_state(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 {
 	int err = 0;
 	int sb_len_wr;
 	u32 rq_res = srp->rq_result;
 
-	sb_len_wr = sg_copy_sense(srp);
+	sb_len_wr = sg_copy_sense(srp, v4_active);
 	if (sb_len_wr < 0)
 		return sb_len_wr;
 	if (rq_res & SG_ML_RESULT_MSK)
@@ -928,7 +1066,7 @@ sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	struct sg_io_hdr *hp = &hdr3;
 
 	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
-	err = sg_rec_v3_state(sfp, srp);
+	err = sg_rec_v3v4_state(sfp, srp, false);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
 	hp->sb_len_wr = srp->sense_len;
@@ -951,6 +1089,102 @@ sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	return err ? err : 0;
 }
 
+static int
+sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
+	      struct sg_io_v4 *h4p)
+{
+	int err, err2;
+	u32 rq_result = srp->rq_result;
+
+	SG_LOG(3, sfp->parentdp, "%s: p=%s, h4p=%s\n", __func__,
+	       (p ? "given" : "NULL"), (h4p ? "given" : "NULL"));
+	err = sg_rec_v3v4_state(sfp, srp, true);
+	h4p->guard = 'Q';
+	h4p->protocol = 0;
+	h4p->subprotocol = 0;
+	h4p->device_status = rq_result & 0xff;
+	h4p->driver_status = driver_byte(rq_result);
+	h4p->transport_status = host_byte(rq_result);
+	h4p->response_len = srp->sense_len;
+	h4p->info = srp->rq_info;
+	h4p->flags = srp->rq_flags;
+	h4p->duration = srp->duration;
+	switch (srp->s_hdr4.dir) {
+	case SG_DXFER_FROM_DEV:
+		h4p->din_xfer_len = srp->sgat_h.dlen;
+		break;
+	case SG_DXFER_TO_DEV:
+		h4p->dout_xfer_len = srp->sgat_h.dlen;
+		break;
+	default:
+		break;
+	}
+	h4p->din_resid = srp->in_resid;
+	h4p->dout_resid = srp->s_hdr4.out_resid;
+	h4p->usr_ptr = srp->s_hdr4.usr_ptr;
+	h4p->response = (u64)srp->s_hdr4.sbp;
+	h4p->request_extra = srp->pack_id;
+	if (p) {
+		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
+			err = err ? err : -EFAULT;
+	}
+	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
+	if (err2)
+		err = err ? err : err2;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err < 0 ? err : 0;
+}
+
+/*
+ * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
+ * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
+ * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
+ * When non-blocking and there is no request waiting, yields EAGAIN;
+ * otherwise it waits (i.e. it "blocks").
+ */
+static int
+sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int res, id;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u8 v4_holder[SZ_SG_IO_V4];
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)v4_holder;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (res)
+		return res;
+	/* Get first three 32 bit integers: guard, proto+subproto */
+	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	/* for v4: protocol=0 --> SCSI;  subprotocol=0 --> SPC++ */
+	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
+		return -EPERM;
+	if (h4p->flags & SGV4_FLAG_IMMED)
+		non_block = true;	/* set by either this or O_NONBLOCK */
+	SG_LOG(3, sdp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	/* read in part of v3 or v4 header for pack_id or tag based find */
+	id = pack_id;
+	srp = sg_find_srp_by_id(sfp, id);
+	if (!srp) {     /* nothing available so wait on packet or */
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (non_block)
+			return -EAGAIN;
+		res = wait_event_interruptible(sfp->read_wait,
+					       sg_get_ready_srp(sfp, &srp,
+								id));
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (res)	/* -ERESTARTSYS as signal hit process */
+			return res;
+	}	/* now srp should be valid */
+	return sg_v4_receive(sfp, srp, p, h4p);
+}
+
 static int
 sg_rd_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	   struct sg_request *srp)
@@ -1226,6 +1460,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
 	rip->pack_id = srp->pack_id;
+	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
+			uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
 	rip->usr_ptr = srp->s_hdr3.usr_ptr;
 	spin_unlock(&srp->req_lck);
 }
@@ -1243,7 +1479,7 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
  */
 static int
 sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
-		  struct sg_request *srp)
+		  struct sg_io_v4 *h4p, struct sg_request *srp)
 {
 	int res;
 	enum sg_rq_state sr_st;
@@ -1271,7 +1507,10 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 	res = sg_rstate_chg(srp, sr_st, SG_RS_BUSY);
 	if (unlikely(res))
 		return res;
-	res = sg_v3_receive(sfp, srp, p);
+	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm))
+		res = sg_v4_receive(sfp, srp, p, h4p);
+	else
+		res = sg_v3_receive(sfp, srp, p);
 	return (res < 0) ? res : 0;
 }
 
@@ -1285,8 +1524,9 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 {
 	int res;
 	struct sg_request *srp = NULL;
-	u8 hu8arr[SZ_SG_IO_HDR];
+	u8 hu8arr[SZ_SG_IO_V4];
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sdp, "%s:  SG_IO%s\n", __func__,
 	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
@@ -1297,15 +1537,25 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return -ENODEV;
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
 		return -EFAULT;
-	if (h3p->interface_id == 'S')
-		res = sg_submit(filp, sfp, h3p, true, &srp);
-	else
+	if (h3p->interface_id == 'Q') {
+		/* copy in rest of sg_io_v4 object */
+		if (copy_from_user(hu8arr + SZ_SG_IO_HDR,
+				   ((u8 __user *)p) + SZ_SG_IO_HDR,
+				   SZ_SG_IO_V4 - SZ_SG_IO_HDR))
+			return -EFAULT;
+		res = sg_v4_submit(filp, sfp, p, h4p, true, &srp);
+	} else if (h3p->interface_id == 'S') {
+		res = sg_v3_submit(filp, sfp, h3p, true, &srp);
+	} else {
+		pr_info_once("sg: %s: v3 or v4 interface only here\n",
+			     __func__);
 		return -EPERM;
+	}
 	if (unlikely(res < 0))
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_event_srp(filp, sfp, p, srp);
+	res = sg_wait_event_srp(filp, sfp, p, h4p, srp);
 	if (res) {
 		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s\n", __func__,
 		       "unexpected srp", srp,
@@ -1494,6 +1744,12 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	switch (cmd_in) {
 	case SG_IO:
 		return sg_ctl_sg_io(filp, sdp, sfp, p);
+	case SG_IOSUBMIT:
+		SG_LOG(3, sdp, "%s:    SG_IOSUBMIT\n", __func__);
+		return sg_ctl_iosubmit(filp, sfp, p);
+	case SG_IORECEIVE:
+		SG_LOG(3, sdp, "%s:    SG_IORECEIVE\n", __func__);
+		return sg_ctl_ioreceive(filp, sfp, p);
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sdp, p);
 	case SG_SET_FORCE_PACK_ID:
@@ -2005,8 +2261,16 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
 	a_resid = scsi_rp->resid_len;
 
-	if (a_resid)
-		srp->in_resid = a_resid;
+	if (a_resid) {
+		if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+			if (rq_data_dir(rq) == READ)
+				srp->in_resid = a_resid;
+			else
+				srp->s_hdr4.out_resid = a_resid;
+		} else {
+			srp->in_resid = a_resid;
+		}
+	}
 
 	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
 	       srp->rq_result);
@@ -2382,7 +2646,8 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 }
 
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
+sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
+	     struct sg_io_v4 *h4p, int dxfer_dir)
 {
 	bool reserved, us_xfer;
 	int res = 0;
@@ -2399,7 +2664,6 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 	struct rq_map_data *md = (void *)srp; /* want any non-NULL value */
 	u8 *long_cmdp = NULL;
 	__maybe_unused const char *cp = "";
-	struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
 	struct rq_map_data map_data;
 
 	sdp = sfp->parentdp;
@@ -2409,10 +2673,28 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 			return -ENOMEM;
 		SG_LOG(5, sdp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
-	up = sh3p->dxferp;
-	dxfer_len = (int)sh3p->dxfer_len;
-	iov_count = sh3p->iovec_count;
-	r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
+	if (h4p) {
+		if (dxfer_dir == SG_DXFER_TO_DEV) {
+			r0w = WRITE;
+			up = uptr64(h4p->dout_xferp);
+			dxfer_len = (int)h4p->dout_xfer_len;
+			iov_count = h4p->dout_iovec_count;
+		} else if (dxfer_dir == SG_DXFER_FROM_DEV) {
+			r0w = READ;
+			up = uptr64(h4p->din_xferp);
+			dxfer_len = (int)h4p->din_xfer_len;
+			iov_count = h4p->din_iovec_count;
+		} else {
+			up = NULL;
+		}
+	} else {
+		struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
+
+		up = sh3p->dxferp;
+		dxfer_len = (int)sh3p->dxfer_len;
+		iov_count = sh3p->iovec_count;
+		r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
+	}
 	SG_LOG(4, sdp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
 	       (r0w ? "OUT" : "IN"));
 	q = sdp->device->request_queue;
@@ -3506,7 +3788,7 @@ static int
 sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
 	__must_hold(&srp->req_lck)
 {
-	bool is_v3, is_dur;
+	bool is_v3v4, v4, is_dur;
 	int n = 0;
 	u32 dur;
 	enum sg_rq_state rq_st;
@@ -3514,9 +3796,10 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
 
 	if (len < 1)
 		return 0;
-	is_v3 = (srp->s_hdr3.interface_id != '\0');
+	v4 = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
+	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
 	if (srp->parentfp->rsv_srp == srp)
-		cp = (is_v3 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
+		cp = (is_v3v4 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
 				"     mmap>> " : "     rsv>> ";
 	else
 		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index bb1be50d7b01..7557c1be01e0 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -98,6 +98,18 @@ typedef struct sg_io_hdr {
 #define SG_FLAG_Q_AT_TAIL 0x10
 #define SG_FLAG_Q_AT_HEAD 0x20
 
+/*
+ * Flags used by ioctl(SG_IOSUBMIT) [abbrev: SG_IOS] and ioctl(SG_IORECEIVE)
+ * [abbrev: SG_IOR] OR-ed into sg_io_v4::flags. The sync v4 interface uses
+ * ioctl(SG_IO) and can take these new flags, as can the v3 interface.
+ * These flags apply for SG_IOS unless otherwise noted. May be OR-ed together.
+ */
+#define SGV4_FLAG_DIRECT_IO SG_FLAG_DIRECT_IO
+#define SGV4_FLAG_MMAP_IO SG_FLAG_MMAP_IO
+#define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
+#define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
+#define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
 #define SG_INFO_OK 0x0		/* no sense, host nor driver "noise" */
@@ -133,7 +145,6 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	/* sg_io_owned set imples synchronous, clear implies asynchronous */
 	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
 	char problem;	/* 0 -> no problem detected, 1 -> error to report */
-	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
 	int pack_id;	/* pack_id, in v4 driver may be tag instead */
 	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
 	unsigned int duration;
@@ -162,6 +173,13 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SET_RESERVED_SIZE 0x2275  /* request new reserved buffer size */
 #define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
 
+/*
+ * Historically the scsi/sg driver has used 0x22 as it ioctl base number.
+ * Add a define for that value and use it for several new ioctls added in
+ * version 4.0.01 sg driver and later.
+ */
+#define SG_IOCTL_MAGIC_NUM 0x22
+
 /* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
 #define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
 /* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
@@ -318,6 +336,23 @@ struct sg_header {
  */
 #define SG_NEXT_CMD_LEN 0x2283
 
+/*
+ * New ioctls to replace async (non-blocking) write()/read() interface.
+ * Present in version 4 and later of the sg driver [>20190427]. The
+ * SG_IOSUBMIT and SG_IORECEIVE ioctls accept the sg_v4 interface based on
+ * struct sg_io_v4 found in <include/uapi/linux/bsg.h>. These objects are
+ * passed by a pointer in the third argument of the ioctl.
+ *
+ * Data may be transferred both from the user space to the driver by these
+ * ioctls. Hence the _IOWR macro is used here to generate the ioctl number
+ * rather than _IOW or _IOR.
+ */
+/* Submits a v4 interface object to driver, optionally receive tag back */
+#define SG_IOSUBMIT _IOWR(SG_IOCTL_MAGIC_NUM, 0x41, struct sg_io_v4)
+
+/* Gives some v4 identifying info to driver, receives associated response */
+#define SG_IORECEIVE _IOWR(SG_IOCTL_MAGIC_NUM, 0x42, struct sg_io_v4)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 

From patchwork Fri May 24 18:47:58 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960327
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 185E1933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:32 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A0A128A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:32 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id F2C8028A24; Fri, 24 May 2019 18:48:31 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9890228A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391637AbfEXSsb (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:31 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56379 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391612AbfEXSsb (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:31 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id E3965204190;
        Fri, 24 May 2019 20:48:29 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id Kz9-U9VyNGfg; Fri, 24 May 2019 20:48:28 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id F12C3204198;
        Fri, 24 May 2019 20:48:20 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 08/19] sg: add 8 byte SCSI LUN to sg_scsi_id
Date: Fri, 24 May 2019 14:47:58 -0400
Message-Id: <20190524184809.25121-9-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The existing ioctl(SG_GET_SCSI_ID) fills a object of type
struct sg_scsi_id whose last field is int unused[2]. Add
an anonymous union with u8 scsi_lun[8] sharing those last
8 bytes. This patch will place the current device's full
LUN in the scsi_lun array using T10's preferred LUN
format (i.e. an array of 8 bytes) when
ioctl(SG_GET_SCSI_ID) is called.

Note that structure already contains a 'lun' field but that
is a 32 bit integer. Users of this upgrade should choose
the scsi_lun array field henceforth but existing code
should do the same as it does with the version 3 driver.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 5 +++--
 include/uapi/scsi/sg.h | 5 ++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3b78b8c8304b..454abfdd4f74 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1694,6 +1694,7 @@ static int
 sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_device *sdp, void __user *p)
 {
 	struct sg_scsi_id __user *sg_idp = p;
+	struct scsi_lun lun8b;
 
 	SG_LOG(3, sdp, "%s:    SG_GET_SCSI_ID\n", __func__);
 	if (!access_ok(p, sizeof(struct sg_scsi_id)))
@@ -1711,8 +1712,8 @@ sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_device *sdp, void __user *p)
 		   &sg_idp->h_cmd_per_lun);
 	__put_user((short)sdev->queue_depth,
 		   &sg_idp->d_queue_depth);
-	__put_user(0, &sg_idp->unused[0]);
-	__put_user(0, &sg_idp->unused[1]);
+	int_to_scsilun(sdev->lun, &lun8b);
+	__copy_to_user(sg_idp->scsi_lun, lun8b.scsi_lun, 8);
 	return 0;
 }
 
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 7557c1be01e0..db86d1ae7e29 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -135,7 +135,10 @@ typedef struct sg_scsi_id {
 	int scsi_type;	/* TYPE_... defined in scsi/scsi.h */
 	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
 	short d_queue_depth;/* device (or adapter) maximum queue length */
-	int unused[2];
+	union {
+		int unused[2];  /* as per version 3 driver */
+		__u8 scsi_lun[8];  /* full 8 byte SCSI LUN [in v4 driver] */
+	};
 } sg_scsi_id_t;
 
 /* For backward compatibility v4 driver yields at most SG_MAX_QUEUE of these */

From patchwork Fri May 24 18:47:59 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960337
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B3DC1390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:40 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 88DCC28A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:40 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7AF0328A28; Fri, 24 May 2019 18:48:40 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EF77728A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391668AbfEXSsg (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:36 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56391 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391625AbfEXSsf (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:35 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 7C5EB204170;
        Fri, 24 May 2019 20:48:32 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id nNsfDYeV1Kai; Fri, 24 May 2019 20:48:28 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 27BE4204172;
        Fri, 24 May 2019 20:48:22 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 09/19] sg: expand sg_comm_wr_t
Date: Fri, 24 May 2019 14:47:59 -0400
Message-Id: <20190524184809.25121-10-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The internal struct sg_comm_wr_t was added when the number of
arguments to sg_common_write() became excessive. Expand this idea
so multiple calls to sg_fetch_cmnd() can be deferred until a
scsi_request object is ready to receive the command. This saves
a 252 byte stack allocation on every submit path. Prior to this
and a few other changes, the kernel infrastructure was warning
about excessive stack usage.

Also make open_cnt (count of active open()s on this device) an
atomic. Add more unlikely() and likely() hints on conditionals

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 348 ++++++++++++++++++++++++----------------------
 1 file changed, 183 insertions(+), 165 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 454abfdd4f74..638368ed9e11 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -244,7 +244,7 @@ struct sg_device {	/* holds the state of each scsi generic device */
 	atomic_t detaching;	/* 0->device usable, 1->device detaching */
 	bool exclude;		/* 1->open(O_EXCL) succeeded and is active */
 	u8 sgdebug;	/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
-	int open_cnt;		/* count of opens (perhaps < num(sfds) ) */
+	atomic_t open_cnt;	/* count of opens (perhaps < num(sfds) ) */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
 	struct kref d_ref;
@@ -252,36 +252,39 @@ struct sg_device {	/* holds the state of each scsi generic device */
 
 struct sg_comm_wr_t {	/* arguments to sg_common_write() */
 	int timeout;
+	int cmd_len;
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
 		struct sg_io_hdr *h3p;
 		struct sg_io_v4 *h4p;
 	};
-	u8 *cmnd;
+	struct sg_fd *sfp;
+	struct file *filp;
+	const u8 __user *u_cmdp;
+	const u8 *cmdp;
 };
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-			struct sg_io_v4 *h4p, int dxfer_dir);
+static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
+			int dxfer_dir);
 // static int sg_finish_rem_req(struct sg_request *srp);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
 static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
 			struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
-static struct sg_request *sg_common_write(struct sg_fd *sfp,
-					  struct sg_comm_wr_t *cwp);
+static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwp);
 static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
-static struct sg_request *sg_add_request(struct sg_fd *sfp, int dxfr_len,
-					 struct sg_comm_wr_t *cwrp);
+static struct sg_request *sg_add_request(struct sg_comm_wr_t *cwrp,
+					 int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
@@ -322,7 +325,7 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 		char _b[160];						\
 		int _tid = (current ? current->pid : -1);		\
 									\
-		if ((sdp) && (sdp)->disk) {				\
+		if (likely((sdp) && (sdp)->disk)) {			\
 			snprintf(_b, sizeof(_b), "%s: tid=%d",		\
 				 (sdp)->disk->disk_name, _tid);		\
 			SCSI_LOG_TIMEOUT(depth,				\
@@ -372,37 +375,38 @@ static int
 sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 	__must_hold(&sdp->open_rel_lock)
 {
-	int retval = 0;
+	int res = 0;
 
 	if (unlikely(o_excl)) {
-		while (sdp->open_cnt > 0) {
+		while (atomic_read(&sdp->open_cnt) > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
-			retval = wait_event_interruptible(sdp->open_wait,
-				    (unlikely(atomic_read(&sdp->detaching)) ||
-				    !sdp->open_cnt));
+			res = wait_event_interruptible
+				(sdp->open_wait,
+				 (unlikely(atomic_read(&sdp->detaching)) ||
+				 !atomic_read(&sdp->open_cnt)));
 			mutex_lock(&sdp->open_rel_lock);
 
-			if (retval) /* -ERESTARTSYS */
-				return retval;
+			if (unlikely(res)) /* -ERESTARTSYS */
+				return res;
 			if (unlikely(atomic_read(&sdp->detaching)))
 				return -ENODEV;
 		}
 	} else {
 		while (unlikely(sdp->exclude)) {
 			mutex_unlock(&sdp->open_rel_lock);
-			retval = wait_event_interruptible(sdp->open_wait,
-				    unlikely((atomic_read(&sdp->detaching)) ||
-					     !sdp->exclude));
+			res = wait_event_interruptible
+				(sdp->open_wait,
+				 unlikely((atomic_read(&sdp->detaching)) ||
+					  !sdp->exclude));
 			mutex_lock(&sdp->open_rel_lock);
 
-			if (retval) /* -ERESTARTSYS */
-				return retval;
+			if (unlikely(res)) /* -ERESTARTSYS */
+				return res;
 			if (unlikely(atomic_read(&sdp->detaching)))
 				return -ENODEV;
 		}
 	}
-
-	return retval;
+	return res;
 }
 
 /*
@@ -415,7 +419,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 static inline int
 sg_allow_if_err_recovery(struct sg_device *sdp, bool non_block)
 {
-	if (!sdp)
+	if (unlikely(!sdp))
 		return -EPROTO;
 	if (non_block)
 		return 0;
@@ -439,7 +443,7 @@ sg_open(struct inode *inode, struct file *filp)
 	int op_flags = filp->f_flags;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	int retval;
+	int res;
 
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
@@ -450,69 +454,70 @@ sg_open(struct inode *inode, struct file *filp)
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 	SG_LOG(3, sdp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
-	       __func__, min_dev, op_flags, "device open", sdp->open_cnt,
-	       (non_block ? " O_NONBLOCK" : ""));
+	       __func__, min_dev, op_flags, "device open",
+	       atomic_read(&sdp->open_cnt), (non_block ? " O_NONBLOCK" : ""));
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
-	retval = scsi_device_get(sdp->device);
-	if (retval)
+	res = scsi_device_get(sdp->device);
+	if (unlikely(res))
 		goto sg_put;
 
-	retval = scsi_autopm_get_device(sdp->device);
-	if (retval)
+	res = scsi_autopm_get_device(sdp->device);
+	if (unlikely(res))
 		goto sdp_put;
 
-	retval = sg_allow_if_err_recovery(sdp, non_block);
-	if (retval)
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (unlikely(res))
 		goto error_out;
 
 	mutex_lock(&sdp->open_rel_lock);
 	if (op_flags & O_NONBLOCK) {
-		if (o_excl) {
-			if (sdp->open_cnt > 0) {
-				retval = -EBUSY;
+		if (unlikely(o_excl)) {
+			if (atomic_read(&sdp->open_cnt) > 0) {
+				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		} else {
-			if (sdp->exclude) {
-				retval = -EBUSY;
+			if (unlikely(sdp->exclude)) {
+				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		}
 	} else {
-		retval = sg_wait_open_event(sdp, o_excl);
-		if (retval) /* -ERESTARTSYS or -ENODEV */
+		res = sg_wait_open_event(sdp, o_excl);
+		if (unlikely(res)) /* -ERESTARTSYS or -ENODEV */
 			goto error_mutex_locked;
 	}
 
 	/* N.B. at this point we are holding the open_rel_lock */
-	if (o_excl)
+	if (unlikely(o_excl))
 		sdp->exclude = true;
 
-	if (sdp->open_cnt < 1) {  /* no existing opens */
+	if (atomic_read(&sdp->open_cnt) < 1) {  /* no existing opens */
 		sdp->sgdebug = 0;
 		/* Next done in sg_alloc(), repeat here to pick up change? */
 		sg_calc_sgat_param(sdp);
 	}
 	sfp = sg_add_sfp(sdp);
 	if (IS_ERR(sfp)) {
-		retval = PTR_ERR(sfp);
+		res = PTR_ERR(sfp);
 		goto out_undo;
 	}
 
 	filp->private_data = sfp;
 	sfp->tid = (current ? current->pid : -1);
-	sdp->open_cnt++;
+	sfp->tid = (current ? current->pid : -1);
+	atomic_inc(&sdp->open_cnt);
 	mutex_unlock(&sdp->open_rel_lock);
 
-	retval = 0;
+	res = 0;
 sg_put:
 	kref_put(&sdp->d_ref, sg_device_destroy);
-	return retval;
+	return res;
 
 out_undo:
-	if (o_excl) {
+	if (unlikely(o_excl)) {
 		sdp->exclude = false;   /* undo if error */
 		wake_up_interruptible(&sdp->open_wait);
 	}
@@ -541,21 +546,21 @@ sg_release(struct inode *inode, struct file *filp)
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sdp, "%s: device open count prior=%d\n", __func__,
-	       sdp->open_cnt);
-	if (!sdp)
+	       atomic_read(&sdp->open_cnt));
+	if (unlikely(!sdp))
 		return -ENXIO;
 
 	mutex_lock(&sdp->open_rel_lock);
 	scsi_autopm_put_device(sdp->device);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
-	sdp->open_cnt--;
+	atomic_dec(&sdp->open_cnt);
 
 	/* possibly many open()s waiting on exlude clearing, start many;
 	 * only open(O_EXCL)s wait on 0==open_cnt so only start one */
-	if (sdp->exclude) {
+	if (unlikely(sdp->exclude)) {
 		sdp->exclude = false;
 		wake_up_interruptible_all(&sdp->open_wait);
-	} else if (0 == sdp->open_cnt) {
+	} else if (atomic_read(&sdp->open_cnt) == 0) {
 		wake_up_interruptible(&sdp->open_wait);
 	}
 	mutex_unlock(&sdp->open_rel_lock);
@@ -577,7 +582,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 	struct sg_header ov2hdr;
 	struct sg_io_hdr v3hdr;
 	struct sg_header *ohp = &ov2hdr;
@@ -585,14 +589,14 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_comm_wr_t cwr;
 
 	res = sg_check_file_access(filp, __func__);
-	if (res)
+	if (unlikely(res))
 		return res;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sdp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
 	res = sg_allow_if_err_recovery(sdp, !!(filp->f_flags & O_NONBLOCK));
-	if (res)
+	if (unlikely(res))
 		return res;
 	if (unlikely(atomic_read(&sdp->detaching)))
 		return -ENODEV;
@@ -664,9 +668,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->flags = input_size;         /* structure abuse ... */
 	h3p->pack_id = ohp->pack_id;
 	h3p->usr_ptr = NULL;
-	cmnd[0] = opcode;
-	if (__copy_from_user(cmnd + 1, p + 1, cmd_size - 1))
-		return -EFAULT;
 	/*
 	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
 	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
@@ -677,13 +678,17 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			"%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
 			"   program %s not setting count and/or reply_len properly\n",
 			__func__, ohp->reply_len - (int)SZ_SG_HEADER,
-			input_size, (unsigned int)cmnd[0], current->comm);
+			input_size, (unsigned int)opcode, current->comm);
 	}
 	cwr.frq_bm[0] = 0;	/* initial state clear for all req flags */
 	cwr.h3p = h3p;
 	cwr.timeout = sfp->timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.cmd_len = cmd_size;
+	cwr.filp = filp;
+	cwr.sfp = sfp;
+	cwr.u_cmdp = p;
+	cwr.cmdp = NULL;
+	srp = sg_common_write(&cwr);
 	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
 
@@ -712,11 +717,9 @@ static int
 sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	     bool sync, struct sg_request **o_srp)
 {
-	int res, timeout;
 	unsigned long ul_timeout;
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
@@ -730,16 +733,16 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
-	res = sg_fetch_cmnd(filp, sfp, hp->cmdp, hp->cmd_len, cmnd);
-	if (res)
-		return res;
 	cwr.frq_bm[0] = 0;
 	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	cwr.h3p = hp;
-	cwr.timeout = timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	cwr.cmd_len = hp->cmd_len;
+	cwr.filp = filp;
+	cwr.sfp = sfp;
+	cwr.u_cmdp = hp->cmdp;
+	cwr.cmdp = NULL;
+	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
 	if (o_srp)
@@ -751,12 +754,15 @@ static int
 sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
 	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
 {
-	int timeout, res;
+	int res = 0;
 	unsigned long ul_timeout;
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 
+	memset(&cwr, 0, sizeof(cwr));
+	cwr.filp = filp;
+	cwr.sfp = sfp;
+	cwr.h4p = h4p;
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
 		int len = 0;
 
@@ -774,18 +780,15 @@ sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
 	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
-	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
-	res = sg_fetch_cmnd(filp, sfp, cuptr64(h4p->request), h4p->request_len,
-			    cmnd);
-	if (res)
-		return res;
 	cwr.frq_bm[0] = 0;
 	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
 	cwr.h4p = h4p;
-	cwr.timeout = timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	cwr.cmd_len = h4p->request_len;
+	cwr.u_cmdp = cuptr64(h4p->request);
+	cwr.cmdp = NULL;
+	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
 	if (o_srp)
@@ -802,7 +805,7 @@ sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
 	struct sg_device *sdp = sfp->parentdp;
 
 	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
-	if (res)
+	if (unlikely(res))
 		return res;
 	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
 		return -EFAULT;
@@ -863,13 +866,14 @@ sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
  * N.B. pack_id placed in sg_io_v4::request_extra field.
  */
 static struct sg_request *
-sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
+sg_common_write(struct sg_comm_wr_t *cwrp)
 {
 	int res = 0;
-	int dxfr_len, dir, cmd_len;
+	int dxfr_len, dir;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u32 rq_flags;
-	struct sg_device *sdp = sfp->parentdp;
+	struct sg_fd *fp = cwrp->sfp;
+	struct sg_device *sdp = fp->parentdp;
 	struct sg_request *srp;
 	struct sg_io_hdr *hi_p;
 	struct sg_io_v4 *h4p;
@@ -901,7 +905,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
-	srp = sg_add_request(sfp, dxfr_len, cwrp);
+	srp = sg_add_request(cwrp, dxfr_len);
 	if (IS_ERR(srp))
 		return srp;
 	srp->rq_flags = rq_flags;
@@ -914,18 +918,16 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 		srp->s_hdr4.max_sb_len = h4p->max_response_len;
 		srp->s_hdr4.cmd_len = h4p->request_len;
 		srp->s_hdr4.dir = dir;
-		cmd_len = h4p->request_len;
 	} else {	/* v3 interface active */
-		cmd_len = hi_p->cmd_len;
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
 	}
-	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
-	SG_LOG(4, sdp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
-	       (int)cwrp->cmnd[0], cmd_len, pack_id);
 
-	res = sg_start_req(srp, cwrp->cmnd, cmd_len, h4p, dir);
-	if (res < 0)		/* probably out of space --> -ENOMEM */
+	res = sg_start_req(srp, cwrp, dir);
+	if (unlikely(res < 0))	/* probably out of space --> -ENOMEM */
 		goto err_out;
+	SG_LOG(4, sdp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
+	       srp->cmd_opcode, cwrp->cmd_len, pack_id);
+
 	if (unlikely(atomic_read(&sdp->detaching))) {
 		res = -ENODEV;
 		goto err_out;
@@ -936,11 +938,11 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	}
 	srp->rq->timeout = cwrp->timeout;
 
-	sg_execute_cmd(sfp, srp);
+	sg_execute_cmd(fp, srp);
 	return srp;
 err_out:
 	sg_finish_scsi_blk_rq(srp);
-	sg_deact_request(sfp, srp);
+	sg_deact_request(fp, srp);
 	return ERR_PTR(res);
 }
 
@@ -1155,7 +1157,7 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 	struct sg_request *srp;
 
 	res = sg_allow_if_err_recovery(sdp, non_block);
-	if (res)
+	if (unlikely(res))
 		return res;
 	/* Get first three 32 bit integers: guard, proto+subproto */
 	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
@@ -1179,8 +1181,8 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 								id));
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
-		if (res)	/* -ERESTARTSYS as signal hit process */
-			return res;
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
 	}	/* now srp should be valid */
 	return sg_v4_receive(sfp, srp, p, h4p);
 }
@@ -1531,7 +1533,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	SG_LOG(3, sdp, "%s:  SG_IO%s\n", __func__,
 	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
-	if (res)
+	if (unlikely(res))
 		return res;
 	if (atomic_read(&sdp->detaching))
 		return -ENODEV;
@@ -1556,7 +1558,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
 	res = sg_wait_event_srp(filp, sfp, p, h4p, srp);
-	if (res) {
+	if (unlikely(res)) {
 		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s\n", __func__,
 		       "unexpected srp", srp,
 		       sg_rq_st_str(atomic_read(&srp->rq_st), false));
@@ -1605,7 +1607,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	/* new sg_request object, sized correctly is now available */
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	o_srp = sfp->rsv_srp;
-	if (!o_srp) {
+	if (unlikely(!o_srp)) {
 		res = -EPROTO;
 		goto wr_unlock;
 	}
@@ -1736,7 +1738,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	sdp = sfp->parentdp;
 	SG_LOG(6, sdp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
 	       !!(filp->f_flags & O_NONBLOCK));
-	if (!sdp)
+	if (unlikely(!sdp))
 		return -ENXIO;
 	if (unlikely(atomic_read(&sdp->detaching)))
 		return -ENODEV;
@@ -1756,7 +1758,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_FORCE_PACK_ID:
 		SG_LOG(3, sdp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
@@ -1786,7 +1788,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
 		result = get_user(val, ip);
-		if (!result) {
+		if (likely(!result)) {
 			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
 				result = sg_set_reserved_sz(sfp, val);
 			} else {
@@ -1808,7 +1810,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_COMMAND_Q:
 		SG_LOG(3, sdp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
@@ -1818,7 +1820,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sdp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
@@ -1838,7 +1840,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_TIMEOUT:
 		SG_LOG(3, sdp, "%s:    SG_SET_TIMEOUT\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		if (val < 0)
 			return -EIO;
@@ -1866,7 +1868,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_NEXT_CMD_LEN:   /* active only in v2 interface */
 		SG_LOG(3, sdp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		if (val > SG_MAX_CDB_SIZE)
 			return -ENOMEM;
@@ -1893,7 +1895,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_SET_DEBUG:
 		SG_LOG(3, sdp, "%s:    SG_SET_DEBUG\n", __func__);
 		result = get_user(val, ip);
-		if (result)
+		if (unlikely(result))
 			return result;
 		sdp->sgdebug = (u8)val;
 		return 0;
@@ -1951,7 +1953,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 			return -ENODEV;
 	}
 	result = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NDELAY));
-	if (result)
+	if (unlikely(result))
 		return result;
 	/* ioctl that reach here are forwarded to the mid-level */
 	return scsi_ioctl(sdev, cmd_in, p);
@@ -1967,7 +1969,7 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	if (!sdp)
+	if (unlikely(!sdp))
 		return -ENXIO;
 
 	sdev = sdp->device;
@@ -1996,7 +1998,7 @@ sg_poll(struct file *filp, poll_table * wait)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	if (!sdp)
+	if (unlikely(!sdp))
 		return EPOLLERR;
 	poll_wait(filp, &sfp->read_wait, wait);
 	if (atomic_read(&sfp->waiting) > 0)
@@ -2022,7 +2024,7 @@ sg_fasync(int fd, struct file *filp, int mode)
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sdp, "%s: mode(%s)\n", __func__, (mode ? "add" : "remove"));
-	if (!sdp)
+	if (unlikely(!sdp))
 		return -ENXIO;
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
@@ -2040,12 +2042,12 @@ sg_vma_fault(struct vm_fault *vmf)
 	struct sg_fd *sfp;
 	const char *nbp = "==NULL, bad";
 
-	if (!vma) {
+	if (unlikely(!vma)) {
 		pr_warn("%s: vma%s\n", __func__, nbp);
 		goto out_err;
 	}
 	sfp = vma->vm_private_data;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		pr_warn("%s: sfp%s\n", __func__, nbp);
 		goto out_err;
 	}
@@ -2055,14 +2057,14 @@ sg_vma_fault(struct vm_fault *vmf)
 		goto out_err;
 	}
 	srp = sfp->rsv_srp;
-	if (!srp) {
+	if (unlikely(!srp)) {
 		SG_LOG(1, sdp, "%s: srp%s\n", __func__, nbp);
 		goto out_err;
 	}
 	spin_lock_irqsave(&srp->req_lck, iflags);
 	rsv_schp = &srp->sgat_h;
 	offset = vmf->pgoff << PAGE_SHIFT;
-	if (offset >= (unsigned int)rsv_schp->buflen) {
+	if (unlikely(offset >= (unsigned int)rsv_schp->buflen)) {
 		SG_LOG(1, sdp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
 		       offset);
 		goto out_err_unlock;
@@ -2108,28 +2110,28 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	struct sg_fd *sfp;
 	struct sg_request *srp;
 
-	if (!filp || !vma)
+	if (unlikely(!filp || !vma))
 		return -ENXIO;
 	sfp = filp->private_data;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
 	}
 	req_sz = vma->vm_end - vma->vm_start;
 	SG_LOG(3, sfp->parentdp, "%s: vm_start=%p, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
-	if (vma->vm_pgoff)
+	if (unlikely(vma->vm_pgoff))
 		return -EINVAL; /* only an offset of 0 accepted */
 	/* Check reserve request is inactive and has large enough buffer */
 	mutex_lock(&sfp->f_mutex);
 	srp = sfp->rsv_srp;
 	spin_lock_irqsave(&srp->req_lck, iflags);
-	if (SG_RS_ACTIVE(srp)) {
+	if (unlikely(SG_RS_ACTIVE(srp))) {
 		ret = -EBUSY;
 		goto out;
 	}
 	rsv_schp = &srp->sgat_h;
-	if (req_sz > (unsigned long)rsv_schp->buflen) {
+	if (unlikely(req_sz > (unsigned long)rsv_schp->buflen)) {
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -2165,12 +2167,12 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 					      ew_orph.work);
 	struct sg_fd *sfp;
 
-	if (!srp) {
+	if (unlikely(!srp)) {
 		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
 		return;
 	}
 	sfp = srp->parentfp;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
 		return;
 	}
@@ -2190,7 +2192,7 @@ sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
 	struct scsi_request *scsi_rp = scsi_req(srp->rq);
 	u8 *sbp = scsi_rp ? scsi_rp->sense : NULL;
 
-	if (!sbp)
+	if (unlikely(!sbp))
 		return;
 	driver_stat = driver_byte(rq_res);
 	if (driver_stat & DRIVER_SENSE) {
@@ -2205,7 +2207,7 @@ sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
 			}
 		}
 	}
-	if (sdp->sgdebug > 0) {
+	if (unlikely(sdp->sgdebug > 0)) {
 		int scsi_stat = rq_res & 0xff;
 
 		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
@@ -2231,11 +2233,11 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
-	if (!scsi_rp) {
+	if (unlikely(!scsi_rp)) {
 		WARN_ONCE("%s: scsi_req(rq) unexpectedly NULL\n", __func__);
 		return;
 	}
-	if (!srp) {
+	if (unlikely(!srp)) {
 		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
 		return;
 	}
@@ -2360,14 +2362,14 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	unsigned long iflags;
 
 	sdp = kzalloc(sizeof(struct sg_device), GFP_KERNEL);
-	if (!sdp)
+	if (unlikely(!sdp))
 		return ERR_PTR(-ENOMEM);
 
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
 
 	error = idr_alloc(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
-	if (error < 0) {
+	if (unlikely(error < 0)) {
 		if (error == -ENOSPC) {
 			sdev_printk(KERN_WARNING, scsidp,
 				    "Unable to attach sg device type=%d, minor number exceeds %d\n",
@@ -2392,6 +2394,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	INIT_LIST_HEAD(&sdp->sfds);
 	init_waitqueue_head(&sdp->open_wait);
 	atomic_set(&sdp->detaching, 0);
+	atomic_set(&sdp->open_cnt, 0);
 	rwlock_init(&sdp->sfd_llock);
 	sg_calc_sgat_param(sdp);
 	sdp->index = k;
@@ -2402,7 +2405,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 	idr_preload_end();
 
-	if (error) {
+	if (unlikely(error)) {
 		kfree(sdp);
 		return ERR_PTR(error);
 	}
@@ -2420,7 +2423,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	unsigned long iflags;
 
 	disk = alloc_disk(1);
-	if (!disk) {
+	if (unlikely(!disk)) {
 		pr_warn("%s: alloc_disk failed\n", __func__);
 		return -ENOMEM;
 	}
@@ -2428,7 +2431,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 
 	error = -ENOMEM;
 	cdev = cdev_alloc();
-	if (!cdev) {
+	if (unlikely(!cdev)) {
 		pr_warn("%s: cdev_alloc failed\n", __func__);
 		goto out;
 	}
@@ -2443,7 +2446,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	}
 
 	error = cdev_add(cdev, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), 1);
-	if (error)
+	if (unlikely(error))
 		goto cdev_add_err;
 
 	sdp->cdev = cdev;
@@ -2517,7 +2520,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	struct sg_fd *sfp;
 	int val;
 
-	if (!sdp)
+	if (unlikely(!sdp))
 		return;
 	/* want sdp->detaching non-zero as soon as possible */
 	val = atomic_inc_return(&sdp->detaching);
@@ -2576,7 +2579,7 @@ init_sg(void)
 
 	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				    SG_MAX_DEVS, "sg");
-	if (rc)
+	if (unlikely(rc))
 		return rc;
 
 	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
@@ -2589,7 +2592,7 @@ init_sg(void)
 	}
 	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
-	if (0 == rc) {
+	if (likely(rc == 0)) {
 		sg_proc_init();
 		return 0;
 	}
@@ -2647,8 +2650,7 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 }
 
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-	     struct sg_io_v4 *h4p, int dxfer_dir)
+sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
 	bool reserved, us_xfer;
 	int res = 0;
@@ -2658,7 +2660,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	void __user *up;
 	struct request *rq;
 	struct scsi_request *scsi_rp;
-	struct sg_fd *sfp = srp->parentfp;
+	struct sg_fd *sfp = cwrp->sfp;
 	struct sg_device *sdp;
 	struct sg_scatter_hold *req_schp;
 	struct request_queue *q;
@@ -2668,20 +2670,22 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	struct rq_map_data map_data;
 
 	sdp = sfp->parentdp;
-	if (cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
-		long_cmdp = kzalloc(cmd_len, GFP_KERNEL);
+	if (cwrp->cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
+		long_cmdp = kzalloc(cwrp->cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
 		SG_LOG(5, sdp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
-	if (h4p) {
+	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+		struct sg_io_v4 *h4p = cwrp->h4p;
+
 		if (dxfer_dir == SG_DXFER_TO_DEV) {
 			r0w = WRITE;
 			up = uptr64(h4p->dout_xferp);
 			dxfer_len = (int)h4p->dout_xfer_len;
 			iov_count = h4p->dout_iovec_count;
 		} else if (dxfer_dir == SG_DXFER_FROM_DEV) {
-			r0w = READ;
+			/* r0w = READ; */
 			up = uptr64(h4p->din_xferp);
 			dxfer_len = (int)h4p->din_xfer_len;
 			iov_count = h4p->din_iovec_count;
@@ -2720,10 +2724,21 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	scsi_rp = scsi_req(rq);
 	srp->rq = rq;
 
-	if (cmd_len > BLK_MAX_CDB)
+	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;
-	memcpy(scsi_rp->cmd, cmd, cmd_len);
-	scsi_rp->cmd_len = cmd_len;
+	if (cwrp->u_cmdp)
+		res = sg_fetch_cmnd(cwrp->filp, sfp, cwrp->u_cmdp,
+				    cwrp->cmd_len, scsi_rp->cmd);
+	else if (cwrp->cmdp)
+		memcpy(scsi_rp->cmd, cwrp->cmdp, cwrp->cmd_len);
+	else
+		res = -EPROTO;
+	if (unlikely(res)) {
+		kfree(long_cmdp);
+		return res;
+	}
+	scsi_rp->cmd_len = cwrp->cmd_len;
+	srp->cmd_opcode = scsi_rp->cmd[0];
 	us_xfer = !(srp->rq_flags & SG_FLAG_NO_DXFER);
 	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
 	reserved = (sfp->rsv_srp == srp);
@@ -2756,7 +2771,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 
 			res = sg_mk_sgat(srp, sfp, up_sz);
 		}
-		if (res)
+		if (unlikely(res))
 			goto fini;
 
 		sg_set_map_data(req_schp, !!up, md);
@@ -2768,7 +2783,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 		struct iov_iter i;
 
 		res = import_iovec(r0w, up, iov_count, 0, &iov, &i);
-		if (res < 0)
+		if (unlikely(res < 0))
 			goto fini;
 
 		iov_iter_truncate(&i, dxfer_len);
@@ -3117,7 +3132,7 @@ sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
 		return n_srp;
 	if (db_len > 0) {
 		res = sg_mk_sgat(n_srp, sfp, db_len);
-		if (res) {
+		if (unlikely(res)) {
 			kfree(n_srp);
 			return ERR_PTR(res);
 		}
@@ -3151,7 +3166,7 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 			go_out = true;
 		}
 		res = sg_mk_sgat(srp, sfp, buflen);
-		if (res == 0) {
+		if (likely(res == 0)) {
 			SG_LOG(4, sfp ? sfp->parentdp : NULL,
 			       "%s: final buflen=%d, srp=0x%p ++\n", __func__,
 			       buflen, srp);
@@ -3172,29 +3187,30 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
  * failure returns a negated errno value twisted by ERR_PTR() macro.
  */
 static struct sg_request *
-sg_add_request(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
+sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 {
 	bool act_empty = false;
 	bool mk_new_srp = true;
 	unsigned long iflags;
 	enum sg_rq_state sr_st;
+	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *rsv_srp;	/* current fd's reserve request */
 	__maybe_unused struct sg_device *sdp;
 	__maybe_unused const char *cp;
 
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	sdp = sfp->parentdp;
-	rsv_srp = sfp->rsv_srp;
+	spin_lock_irqsave(&fp->rq_list_lock, iflags);
+	sdp = fp->parentdp;
+	rsv_srp = fp->rsv_srp;
 	cp = "";
 	/*
 	 * Check the free list (fl) for candidates. Pick zero data length
 	 * requests from the back of the fl, the rest from the front.
 	 */
-	if (list_empty(&sfp->rq_fl)) {
+	if (list_empty(&fp->rq_fl)) {
 		act_empty = true;
 	} else if (dxfr_len < 1) {  /* 0 data length requests at back of fl */
-		list_for_each_entry_reverse(r_srp, &sfp->rq_fl, fl_entry) {
+		list_for_each_entry_reverse(r_srp, &fp->rq_fl, fl_entry) {
 			sr_st = atomic_read(&r_srp->rq_st);
 			if (sr_st == SG_RS_INACTIVE) {
 				if (likely(sg_rstate_chg(r_srp, sr_st,
@@ -3206,7 +3222,7 @@ sg_add_request(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 			}
 		}
 	} else { /*     find request with large enough dlen */
-		list_for_each_entry(r_srp, &sfp->rq_fl, fl_entry) {
+		list_for_each_entry(r_srp, &fp->rq_fl, fl_entry) {
 			sr_st = atomic_read(&r_srp->rq_st);
 			if (sr_st == SG_RS_INACTIVE &&
 			    r_srp->sgat_h.buflen >= dxfr_len) {
@@ -3221,19 +3237,19 @@ sg_add_request(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 	}
 
 	if (mk_new_srp) {	/* Need new sg_request object */
-		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
 
 		r_srp = NULL;
-		if (!allow_cmd_q && !list_empty(&sfp->rq_list)) {
+		if (!allow_cmd_q && !list_empty(&fp->rq_list)) {
 			r_srp = ERR_PTR(-EDOM);
 			SG_LOG(6, sdp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
 		}
-		spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+		spin_unlock_irqrestore(&fp->rq_list_lock, iflags);
 		if (IS_ERR(r_srp))        /* NULL is not an ERR here */
 			goto err_no_lock;
 		/* releasing rq_list_lock because next line could take time */
-		r_srp = sg_mk_srp_sgat(sfp, act_empty, dxfr_len);
+		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
 		if (IS_ERR(r_srp))
 			goto err_no_lock;
 		cp = "new";
@@ -3251,11 +3267,11 @@ sg_add_request(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 	if (mk_new_srp)
-		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_add_tail_rcu(&r_srp->rq_entry, &sfp->rq_list);
+		spin_lock_irqsave(&fp->rq_list_lock, iflags);
+	list_add_tail_rcu(&r_srp->rq_entry, &fp->rq_list);
 	if (!mk_new_srp)
 		spin_unlock(&r_srp->req_lck);
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&fp->rq_list_lock, iflags);
 err_no_lock:
 	if (IS_ERR(r_srp))
 		SG_LOG(1, sdp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
@@ -3353,6 +3369,8 @@ sg_add_sfp(struct sg_device *sdp)
 	assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
 	assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
+	atomic_set(&sfp->submitted, 0);
+	atomic_set(&sfp->waiting, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
@@ -3890,7 +3908,7 @@ sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
 	n += scnprintf(obp + n, len - n,
 		       "  max_sgat_sz,elems=2^%d,%d excl=%d open_cnt=%d\n",
 		       ilog2(sdp->max_sgat_sz), sdp->max_sgat_elems,
-		       sdp->exclude, sdp->open_cnt);
+		       sdp->exclude, atomic_read(&sdp->open_cnt));
 	list_for_each_entry(fp, &sdp->sfds, sfd_entry) {
 		++*countp;
 		rcu_read_lock(); /* assume irqs disabled */

From patchwork Fri May 24 18:48:00 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960329
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE2501390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:34 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D043528A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:34 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C4DA328A25; Fri, 24 May 2019 18:48:34 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10A4D28A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391656AbfEXSsd (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:33 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56384 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391612AbfEXSsd (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:33 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id D4826204199;
        Fri, 24 May 2019 20:48:31 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id 6VpX4cisSR0e; Fri, 24 May 2019 20:48:29 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 52E28204170;
        Fri, 24 May 2019 20:48:23 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 10/19] sg: add sg_ioabort ioctl
Date: Fri, 24 May 2019 14:48:00 -0400
Message-Id: <20190524184809.25121-11-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add ioctl(SG_IOABORT) that acts as a front-end to
blk_abort_request() which is only called if the request
is "inflight". The request to abort is matched via its
pack_id and the scope of the search is the current
device.

That scope will be fine tuned in a later patch to being
either all file descriptors belonging to the current
device, or just the current file descriptor.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 158 +++++++++++++++++++++++++++++++++--------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 132 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 638368ed9e11..7575dd66dbef 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -111,6 +111,7 @@ enum sg_rq_state {
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
 #define SG_FRQ_DIO_IN_USE	3	/* false->indirect_IO,mmap; 1->dio */
 #define SG_FRQ_NO_US_XFER	4	/* no user space transfer of data */
+#define SG_FRQ_ABORTING		5	/* in process of aborting this cmd */
 #define SG_FRQ_DEACT_ORPHAN	7	/* not keeping orphan so de-activate */
 #define SG_FRQ_BLK_PUT_REQ	9	/* set when blk_put_request() called */
 
@@ -1566,6 +1567,97 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	return res;
 }
 
+static struct sg_request *
+sg_match_request(struct sg_fd *sfp, int id)
+{
+	bool found = false;
+	struct sg_request *srp;
+
+	if (list_empty(&sfp->rq_list))
+		return NULL;
+	if (id == SG_PACK_ID_WILDCARD)	/* both wildcards same value: -1 */
+		return list_first_entry_or_null(&sfp->rq_list,
+						struct sg_request, rq_entry);
+	list_for_each_entry(srp, &sfp->rq_list, rq_entry) {
+		if (id == srp->pack_id) {
+			found = true;
+			break;
+		}
+	}
+	return found ? srp : NULL;
+}
+
+static int
+sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
+{
+	int res, pack_id, id;
+	unsigned long iflags;
+	struct sg_fd *o_sfp;
+	struct sg_request *srp;
+	struct sg_io_v4 io_v4;
+	struct sg_io_v4 *h4p = &io_v4;
+
+	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
+		return -EPERM;
+	pack_id = h4p->request_extra;
+	id = pack_id;
+
+	spin_lock(&sfp->rq_list_lock);
+	srp = sg_match_request(sfp, id);
+	if (srp) {
+		spin_lock_irqsave(&srp->req_lck, iflags);
+		spin_unlock(&sfp->rq_list_lock);
+	} else {	/* assume device (not just fd) scope */
+		spin_unlock(&sfp->rq_list_lock);
+		read_lock(&sdp->sfd_llock);
+		list_for_each_entry(o_sfp, &sdp->sfds, sfd_entry) {
+			if (o_sfp == sfp)
+				continue;	/* already checked */
+			spin_lock(&o_sfp->rq_list_lock);
+			srp = sg_match_request(o_sfp, id);
+			if (srp) {
+				spin_lock_irqsave(&srp->req_lck, iflags);
+				spin_unlock(&o_sfp->rq_list_lock);
+				sfp = o_sfp;
+				break;
+			}
+			spin_unlock(&o_sfp->rq_list_lock);
+		}
+		read_unlock(&sdp->sfd_llock);
+	}
+	if (!srp)
+		return -ENODATA;
+
+	set_bit(SG_FRQ_ABORTING, srp->frq_bm);
+	res = 0;
+	switch (atomic_read(&srp->rq_st)) {
+	case SG_RS_BUSY:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		res = -EBUSY;	/* shouldn't occur often */
+		break;
+	case SG_RS_INACTIVE:	/* inactive on rq_list not good */
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		res = -EPROTO;
+		break;
+	case SG_RS_AWAIT_RD:	/* user should still do completion */
+	case SG_RS_DONE_RD:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;		/* nothing to do here, return 0 */
+	case SG_RS_INFLIGHT:	/* only attempt abort if inflight */
+		srp->rq_result |= (DRIVER_SOFT << 24);
+		if (srp->rq)
+			blk_abort_request(srp->rq);
+		break;
+	default:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;
+	}
+	spin_unlock_irqrestore(&srp->req_lck, iflags);
+	return res;
+}
+
 /*
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
@@ -1725,7 +1817,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
 	bool check_detach = false;
 	int val;
-	int result = 0;
+	int res = 0;
 	void __user *p = uptr64(arg);
 	int __user *ip = p;
 	struct sg_device *sdp;
@@ -1753,13 +1845,21 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_IORECEIVE:
 		SG_LOG(3, sdp, "%s:    SG_IORECEIVE\n", __func__);
 		return sg_ctl_ioreceive(filp, sfp, p);
+	case SG_IOABORT:
+		SG_LOG(3, sdp, "%s:    SG_IOABORT\n", __func__);
+		if (read_only)
+			return -EPERM;
+		mutex_lock(&sfp->f_mutex);
+		res = sg_ctl_abort(sdp, sfp, p);
+		mutex_unlock(&sfp->f_mutex);
+		return res;
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sdp, p);
 	case SG_SET_FORCE_PACK_ID:
 		SG_LOG(3, sdp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
@@ -1787,31 +1887,31 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
-		result = get_user(val, ip);
-		if (likely(!result)) {
+		res = get_user(val, ip);
+		if (likely(!res)) {
 			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
-				result = sg_set_reserved_sz(sfp, val);
+				res = sg_set_reserved_sz(sfp, val);
 			} else {
 				SG_LOG(3, sdp, "%s: invalid size\n", __func__);
-				result = -EINVAL;
+				res = -EINVAL;
 			}
 		}
 		mutex_unlock(&sfp->f_mutex);
-		return result;
+		return res;
 	case SG_GET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
 		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
 			    sdp->max_sgat_sz);
 		SG_LOG(3, sdp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
 		       __func__, val);
-		result = put_user(val, ip);
+		res = put_user(val, ip);
 		mutex_unlock(&sfp->f_mutex);
-		return result;
+		return res;
 	case SG_SET_COMMAND_Q:
 		SG_LOG(3, sdp, "%s:    SG_SET_COMMAND_Q\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_COMMAND_Q:
@@ -1819,9 +1919,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		return put_user(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm), ip);
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sdp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
@@ -1839,9 +1939,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		break;
 	case SG_SET_TIMEOUT:
 		SG_LOG(3, sdp, "%s:    SG_SET_TIMEOUT\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		if (val < 0)
 			return -EIO;
 		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
@@ -1867,9 +1967,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		return put_user((int)sdev->host->unchecked_isa_dma, ip);
 	case SG_NEXT_CMD_LEN:   /* active only in v2 interface */
 		SG_LOG(3, sdp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		if (val > SG_MAX_CDB_SIZE)
 			return -ENOMEM;
 		mutex_lock(&sfp->f_mutex);
@@ -1894,9 +1994,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 				     filp->f_mode, p);
 	case SG_SET_DEBUG:
 		SG_LOG(3, sdp, "%s:    SG_SET_DEBUG\n", __func__);
-		result = get_user(val, ip);
-		if (unlikely(result))
-			return result;
+		res = get_user(val, ip);
+		if (unlikely(res))
+			return res;
 		sdp->sgdebug = (u8)val;
 		return 0;
 	case BLKSECTGET:
@@ -1952,9 +2052,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 	}
-	result = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NDELAY));
-	if (unlikely(result))
-		return result;
+	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NDELAY));
+	if (unlikely(res))
+		return res;
 	/* ioctl that reach here are forwarded to the mid-level */
 	return scsi_ioctl(sdev, cmd_in, p);
 }
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index db86d1ae7e29..cd58f918a784 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -356,6 +356,9 @@ struct sg_header {
 /* Gives some v4 identifying info to driver, receives associated response */
 #define SG_IORECEIVE _IOWR(SG_IOCTL_MAGIC_NUM, 0x42, struct sg_io_v4)
 
+/* Provides identifying info about a prior submission (e.g. a tag) */
+#define SG_IOABORT _IOW(SG_IOCTL_MAGIC_NUM, 0x43, struct sg_io_v4)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 

From patchwork Fri May 24 18:48:01 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960333
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 76B10933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:37 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 68F4128A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:37 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5DA4328A26; Fri, 24 May 2019 18:48:37 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C82BE28A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391666AbfEXSsg (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:36 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56397 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391612AbfEXSsf (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:35 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 6E74820418A;
        Fri, 24 May 2019 20:48:33 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id YRD6jubt7nos; Fri, 24 May 2019 20:48:31 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 7BB42204197;
        Fri, 24 May 2019 20:48:24 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 11/19] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls
Date: Fri, 24 May 2019 14:48:01 -0400
Message-Id: <20190524184809.25121-12-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add ioctl(SG_IOSUBMIT_V3) and ioctl(SG_IORECEIVE_V3). These ioctls
are meant to be (almost) drop-in replacements for the write()/read()
async version 3 interface. They only accept the version 3 interface.

See the webpage at: http://sg.danny.cz/sg/sg_v40.html
specifically the table in the section titled: "12 SG interface
support changes".

If sgv3 is a struct sg_io_hdr object, suitably configured, then
    res = write(sg_fd, &sgv3, sizeof(sgv3));
and
    res = ioctl(sg_fd, SG_IOSUBMIT_V3, &sgv3);
are equivalent. Dito for read() and ioctl(SG_IORECEIVE_V3).

Note this inconsistency: for async/non-blocking usage, only
the v4 interface objects can use ioctl(SG_IOSUBMIT) and
(SG_IORECEIVE) while only v3 interface objects can use
ioctl(SG_IOSUBMIT_V3) and (SG_IORECEIVE_V3). However with
the sync usage both v3 and v4 interface objects can use
ioctl(SG_IO). The former action is preferred but existing
practice for sync usage (i.e. the current sg driver
supporting the v3 interface object with ioctl(SG_IO) while
the bsg driver continues to support the v4 interface object
with ioctl(SG_IO)) dictates the dual nature of ioctl(SG_IO).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 81 ++++++++++++++++++++++++++++++++++++++++--
 include/uapi/scsi/sg.h |  6 ++++
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 7575dd66dbef..a261ad2c4567 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -12,9 +12,9 @@
  *
  */
 
-static int sg_version_num = 30536;	/* 2 digits for each component */
-#define SG_VERSION_STR "3.5.36"
-static char *sg_version_date = "20140603";
+static int sg_version_num = 40001;	/* 2 digits for each component */
+#define SG_VERSION_STR "4.0.01"
+static char *sg_version_date = "20190520";
 
 #include <linux/module.h>
 
@@ -815,6 +815,24 @@ sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
 	return -EPERM;
 }
 
+static int
+sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	int res;
+	u8 hdr_store[SZ_SG_IO_V4];      /* max(v3interface, v4interface) */
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
+	struct sg_device *sdp = sfp->parentdp;
+
+	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	if (unlikely(res))
+		return res;
+	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
+		return -EFAULT;
+	if (h3p->interface_id == 'S')
+		return sg_v3_submit(filp, sfp, h3p, false, NULL);
+	return -EPERM;
+}
+
 static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
@@ -1188,6 +1206,57 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 	return sg_v4_receive(sfp, srp, p, h4p);
 }
 
+/*
+ * Called when ioctl(SG_IORECEIVE_V3) received. Expects a v3 interface.
+ * Checks if O_NONBLOCK file flag given, if not checks given flags field
+ * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
+ * When non-blocking and there is no request waiting, yields EAGAIN;
+ * otherwise it waits.
+ */
+static int
+sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int res;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u8 v3_holder[SZ_SG_IO_HDR];
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)v3_holder;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (unlikely(res))
+		return res;
+	/* Get first three 32 bit integers: guard, proto+subproto */
+	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
+		return -EFAULT;
+	/* for v3: interface_id=='S' (in a 32 bit int) */
+	if (h3p->interface_id != 'S')
+		return -EPERM;
+	if (h3p->flags & SGV4_FLAG_IMMED)
+		non_block = true;	/* set by either this or O_NONBLOCK */
+	SG_LOG(3, sdp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
+		pack_id = h3p->pack_id;
+
+	srp = sg_find_srp_by_id(sfp, pack_id);
+	if (!srp) {     /* nothing available so wait on packet or */
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (non_block)
+			return -EAGAIN;
+		res = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, pack_id));
+		if (unlikely(atomic_read(&sdp->detaching)))
+			return -ENODEV;
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
+	}	/* now srp should be valid */
+	return sg_v3_receive(sfp, srp, p);
+}
+
 static int
 sg_rd_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	   struct sg_request *srp)
@@ -1842,9 +1911,15 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_IOSUBMIT:
 		SG_LOG(3, sdp, "%s:    SG_IOSUBMIT\n", __func__);
 		return sg_ctl_iosubmit(filp, sfp, p);
+	case SG_IOSUBMIT_V3:
+		SG_LOG(3, sdp, "%s:    SG_IOSUBMIT_V3\n", __func__);
+		return sg_ctl_iosubmit_v3(filp, sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sdp, "%s:    SG_IORECEIVE\n", __func__);
 		return sg_ctl_ioreceive(filp, sfp, p);
+	case SG_IORECEIVE_V3:
+		SG_LOG(3, sdp, "%s:    SG_IORECEIVE_V3\n", __func__);
+		return sg_ctl_ioreceive_v3(filp, sfp, p);
 	case SG_IOABORT:
 		SG_LOG(3, sdp, "%s:    SG_IOABORT\n", __func__);
 		if (read_only)
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index cd58f918a784..8181e36442a8 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -359,6 +359,12 @@ struct sg_header {
 /* Provides identifying info about a prior submission (e.g. a tag) */
 #define SG_IOABORT _IOW(SG_IOCTL_MAGIC_NUM, 0x43, struct sg_io_v4)
 
+/* Submits a v3 interface object to driver */
+#define SG_IOSUBMIT_V3 _IOWR(SG_IOCTL_MAGIC_NUM, 0x45, struct sg_io_hdr)
+
+/* Gives some v3 identifying info to driver, receives associated response */
+#define SG_IORECEIVE_V3 _IOWR(SG_IOCTL_MAGIC_NUM, 0x46, struct sg_io_hdr)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 

From patchwork Fri May 24 18:48:02 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960339
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19166933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:41 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0927928A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:41 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id EF8AE28A25; Fri, 24 May 2019 18:48:40 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 856B628A25
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403785AbfEXSsj (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:39 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56410 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391612AbfEXSsi (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:38 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 08CF2204165;
        Fri, 24 May 2019 20:48:36 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id KBsTs+w-ug-B; Fri, 24 May 2019 20:48:32 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id A595320417E;
        Fri, 24 May 2019 20:48:25 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 12/19] sg: add sg_set_get_extended ioctl
Date: Fri, 24 May 2019 14:48:02 -0400
Message-Id: <20190524184809.25121-13-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add ioctl(SG_SET_GET_EXTENDED) together with its interface:
struct sg_extended_info which is 96 bytes long, only half
of which is currently used. The "SET_GET" component of the
name is to stress data flows towards and back from the ioctl.

That ioctl has three sections: one for getting and setting 32
bit quantities, a second section for manipulating boolean
(bit) flags, and a final section for reading 32 bit
quantities where a well known value is written and the
corresponding value is read back. Several settings can be
made in one invocation.

See the webpage at: http://sg.danny.cz/sg/sg_v40.html
specifically the section titled: "13 IOCTLs".

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 252 +++++++++++++++++++++++++++++++++++++----
 include/uapi/scsi/sg.h |  67 +++++++++++
 2 files changed, 295 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index a261ad2c4567..c014fb24eca1 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -41,9 +41,9 @@ static char *sg_version_date = "20190520";
 #include <linux/atomic.h>
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
-#include <linux/cred.h>			/* for sg_check_file_access() */
+#include <linux/cred.h>		/* for sg_check_file_access() */
 #include <linux/bsg.h>
-#include <linux/proc_fs.h>
+#include <linux/proc_fs.h>	/* used if CONFIG_SCSI_PROC_FS */
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_eh.h>
@@ -120,7 +120,9 @@ enum sg_rq_state {
 #define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
 #define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
 #define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
+#define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+#define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
 
 
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
@@ -271,13 +273,12 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
 			int dxfer_dir);
-// static int sg_finish_rem_req(struct sg_request *srp);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
 static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
 			struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
-static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwp);
+static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
 static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
@@ -302,6 +303,7 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
 #define SZ_SG_IO_V4 ((int)sizeof(struct sg_io_v4))  /* v4 header (in bsg.h) */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
+#define SZ_SG_EXTENDED_INFO ((int)sizeof(struct sg_extended_info))
 
 /* There is a assert that SZ_SG_IO_V4 >= SZ_SG_IO_HDR in first function */
 
@@ -440,11 +442,11 @@ static int
 sg_open(struct inode *inode, struct file *filp)
 {
 	bool o_excl, non_block;
+	int res;
 	int min_dev = iminor(inode);
 	int op_flags = filp->f_flags;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	int res;
 
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
@@ -508,7 +510,6 @@ sg_open(struct inode *inode, struct file *filp)
 
 	filp->private_data = sfp;
 	sfp->tid = (current ? current->pid : -1);
-	sfp->tid = (current ? current->pid : -1);
 	atomic_inc(&sdp->open_cnt);
 	mutex_unlock(&sdp->open_rel_lock);
 
@@ -842,7 +843,10 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	SG_LOG(3, sdp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
-	srp->start_ns = ktime_get_boot_ns();
+	if (test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm))
+		srp->start_ns = 0;
+	else
+		srp->start_ns = ktime_get_boot_ns();	/* assume always > 0 */
 	srp->duration = 0;
 
 	if (!is_v4h && srp->s_hdr3.interface_id == '\0')
@@ -1473,29 +1477,42 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+/*
+ * Returns duration since srp->start_ns (using boot time as an epoch). Unit
+ * is nanoseconds when time_in_ns==true; else it is in milliseconds.
+ * For backward compatibility the duration is placed in a 32 bit unsigned
+ * integer. This limits the maximum nanosecond duration that can be
+ * represented (without wrapping) to about 4.3 seconds. If that is exceeded
+ * return equivalent of 3.999.. secs as it is more eye catching than the real
+ * number. Negative durations should not be possible but if they occur set
+ * duration to an unlikely 2 nanosec. Stalls in a request setup will have
+ * ts0==U64_MAX and will return 1 for an unlikely 1 nanosecond duration.
+ */
 static u32
-sg_calc_rq_dur(const struct sg_request *srp)
+sg_calc_rq_dur(const struct sg_request *srp, bool time_in_ns)
 {
 	ktime_t ts0 = srp->start_ns;
 	ktime_t now_ts;
 	s64 diff;
 
-	if (ts0 == 0)
+	if (ts0 == 0)	/* only when SG_FFD_NO_DURATION is set */
 		return 0;
 	if (unlikely(ts0 == U64_MAX))	/* _prior_ to issuing req */
-		return 999999999;	/* eye catching */
+		return time_in_ns ? 1 : 999999999;
 	now_ts = ktime_get_boot_ns();
 	if (unlikely(ts0 > now_ts))
-		return 999999998;
-	/* unlikely req duration will exceed 2**32 milliseconds */
-	diff = ktime_ms_delta(now_ts, ts0);
+		return time_in_ns ? 2 : 999999998;
+	if (time_in_ns)
+		diff = ktime_to_ns(ktime_sub(now_ts, ts0));
+	else	/* unlikely req duration will exceed 2**32 milliseconds */
+		diff = ktime_ms_delta(now_ts, ts0);
 	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
 }
 
 /* Return of U32_MAX means srp is inactive or in slave waiting state */
 static u32
 sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
-	   bool *is_durp)
+	   bool time_in_ns, bool *is_durp)
 {
 	bool is_dur = false;
 	u32 res = U32_MAX;
@@ -1503,7 +1520,7 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 	switch (sr_stp ? *sr_stp : atomic_read(&srp->rq_st)) {
 	case SG_RS_INFLIGHT:
 	case SG_RS_BUSY:
-		res = sg_calc_rq_dur(srp);
+		res = sg_calc_rq_dur(srp, time_in_ns);
 		break;
 	case SG_RS_AWAIT_RD:
 	case SG_RS_DONE_RD:
@@ -1525,7 +1542,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 		__must_hold(&sfp->rq_list_lock)
 {
 	spin_lock(&srp->req_lck);
-	rip->duration = sg_get_dur(srp, NULL, NULL);
+	rip->duration = sg_get_dur(srp, NULL, test_bit(SG_FFD_TIME_IN_NS,
+						       sfp->ffd_bm), NULL);
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
@@ -1802,6 +1820,177 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	return res;
 }
 
+/*
+ * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
+ * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
+ * [wo], write-only; [raw], read after write; [rbw], read before write.
+ */
+static void
+sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
+{
+	bool flg = false;
+	u32 c_flgs_wm = seip->ctl_flags_wr_mask;
+	u32 c_flgs_rm = seip->ctl_flags_rd_mask;
+	u32 *c_flgsp = &seip->ctl_flags;
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* TIME_IN_NS boolean, [raw] time in nanoseconds (def: millisecs) */
+	if (c_flgs_wm & SG_CTL_FLAGM_TIME_IN_NS)
+		assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm,
+			   !!(*c_flgsp & SG_CTL_FLAGM_TIME_IN_NS));
+	if (c_flgs_rm & SG_CTL_FLAGM_TIME_IN_NS) {
+		if (test_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm))
+			*c_flgsp |= SG_CTL_FLAGM_TIME_IN_NS;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_TIME_IN_NS;
+	}
+	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
+	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
+		if (atomic_read(&sdp->open_cnt) > 1)
+			*c_flgsp |= SG_CTL_FLAGM_OTHER_OPENS;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_OTHER_OPENS;
+	}
+	/* Q_TAIL boolean, [raw] 1: queue at tail; 0: head (def: depends) */
+	if (c_flgs_wm & SG_CTL_FLAGM_Q_TAIL)
+		assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm,
+			   !!(*c_flgsp & SG_CTL_FLAGM_Q_TAIL));
+	if (c_flgs_rm & SG_CTL_FLAGM_Q_TAIL) {
+		if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+			*c_flgsp |= SG_CTL_FLAGM_Q_TAIL;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_Q_TAIL;
+	}
+	/* NO_DURATION boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
+		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
+	if (c_flgs_wm & SG_CTL_FLAGM_NO_DURATION)
+		assign_bit(SG_FFD_NO_DURATION, sfp->ffd_bm,
+			   !!(*c_flgsp & SG_CTL_FLAGM_NO_DURATION));
+	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION) {
+		if (flg)
+			*c_flgsp |= SG_CTL_FLAGM_NO_DURATION;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_NO_DURATION;
+	}
+}
+
+static void
+sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
+{
+	u32 uv;
+	unsigned long iflags;
+	struct sg_fd *a_sfp;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	switch (seip->read_value) {
+	case SG_SEIRV_INT_MASK:
+		seip->read_value = SG_SEIM_ALL_BITS;
+		break;
+	case SG_SEIRV_BOOL_MASK:
+		seip->read_value = SG_CTL_FLAGM_ALL_BITS;
+		break;
+	case SG_SEIRV_VERS_NUM:
+		seip->read_value = sg_version_num;
+		break;
+	case SG_SEIRV_FL_RQS:
+		uv = 0;
+		rcu_read_lock();
+		list_for_each_entry_rcu(srp, &sfp->rq_fl, fl_entry)
+			++uv;
+		rcu_read_unlock();
+		seip->read_value = uv;
+		break;
+	case SG_SEIRV_DEV_FL_RQS:
+		uv = 0;
+		read_lock_irqsave(&sdp->sfd_llock, iflags);
+		list_for_each_entry(a_sfp, &sdp->sfds, sfd_entry) {
+			rcu_read_lock();
+			list_for_each_entry_rcu(srp, &a_sfp->rq_fl, fl_entry)
+				++uv;
+			rcu_read_unlock();
+		}
+		read_unlock_irqrestore(&sdp->sfd_llock, iflags);
+		seip->read_value = uv;
+		break;
+	case SG_SEIRV_SUBMITTED:  /* counts all non-blocking on active list */
+		seip->read_value = (u32)atomic_read(&sfp->submitted);
+		break;
+	case SG_SEIRV_DEV_SUBMITTED: /* sum(submitted) on all fd's siblings */
+		uv = 0;
+		read_lock_irqsave(&sdp->sfd_llock, iflags);
+		list_for_each_entry(a_sfp, &sdp->sfds, sfd_entry)
+			uv += (u32)atomic_read(&a_sfp->submitted);
+		read_unlock_irqrestore(&sdp->sfd_llock, iflags);
+		seip->read_value = uv;
+		break;
+	default:
+		SG_LOG(6, sdp, "%s: can't decode %d --> read_value\n",
+		       __func__, seip->read_value);
+		seip->read_value = 0;
+		break;
+	}
+}
+
+/* Called when processing ioctl(SG_SET_GET_EXTENDED) */
+static int
+sg_ctl_extended(struct sg_fd *sfp, void __user *p)
+{
+	int result = 0;
+	int ret = 0;
+	int s_wr_mask, s_rd_mask;
+	u32 or_masks;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_extended_info *seip;
+	struct sg_extended_info sei;
+
+	seip = &sei;
+	if (copy_from_user(seip, p, SZ_SG_EXTENDED_INFO))
+		return -EFAULT;
+	s_wr_mask = seip->sei_wr_mask;
+	s_rd_mask = seip->sei_rd_mask;
+	or_masks = s_wr_mask | s_rd_mask;
+	if (or_masks == 0) {
+		SG_LOG(2, sdp, "%s: both masks 0, do nothing\n", __func__);
+		return 0;
+	}
+	SG_LOG(3, sdp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask,
+	       s_rd_mask);
+	/* check all boolean flags for either wr or rd mask set in or_mask */
+	if (or_masks & SG_SEIM_CTL_FLAGS)
+		sg_extended_bool_flags(sfp, seip);
+	/* yields minor_index (type: u32) [ro] */
+	if (or_masks & SG_SEIM_MINOR_INDEX) {
+		if (s_wr_mask & SG_SEIM_MINOR_INDEX) {
+			SG_LOG(2, sdp, "%s: writing to minor_index ignored\n",
+			       __func__);
+		}
+		if (s_rd_mask & SG_SEIM_MINOR_INDEX)
+			seip->minor_index = sdp->index;
+	}
+	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
+		sg_extended_read_value(sfp, seip);
+	/* reserved_sz [raw], since may be reduced by other limits */
+	if (s_wr_mask & SG_SEIM_RESERVED_SIZE) {
+		mutex_lock(&sfp->f_mutex);
+		result = sg_set_reserved_sz(sfp, (int)seip->reserved_sz);
+		if (ret == 0 && result)
+			ret = result;
+		mutex_unlock(&sfp->f_mutex);
+	}
+	if (s_rd_mask & SG_SEIM_RESERVED_SIZE)
+		seip->reserved_sz = (u32)min_t(int,
+					       sfp->rsv_srp->sgat_h.buflen,
+					       sdp->max_sgat_sz);
+	/* copy to user space if int or boolean read mask non-zero */
+	if (s_rd_mask || seip->ctl_flags_rd_mask) {
+		if (copy_to_user(p, seip, SZ_SG_EXTENDED_INFO))
+			ret = ret ? ret : -EFAULT;
+	}
+	return ret;
+}
+
 /*
  * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First
  * fetch from the active list then, if there is still room, from the free
@@ -1928,6 +2117,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		res = sg_ctl_abort(sdp, sfp, p);
 		mutex_unlock(&sfp->f_mutex);
 		return res;
+	case SG_SET_GET_EXTENDED:
+		SG_LOG(3, sdp, "%s:    SG_SET_GET_EXTENDED\n", __func__);
+		return sg_ctl_extended(sfp, p);
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sdp, p);
 	case SG_SET_FORCE_PACK_ID:
@@ -2452,7 +2644,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 
 	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
 	       srp->rq_result);
-	srp->duration = sg_calc_rq_dur(srp);
+	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
+		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
+							     sfp->ffd_bm));
 	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
 		sg_check_sense(sdp, srp, slen);
 	if (slen > 0) {
@@ -3441,6 +3635,9 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
 	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
+	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
+	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
+		r_srp->start_ns = U64_MAX;
 	if (mk_new_srp)
 		spin_lock_irqsave(&fp->rq_list_lock, iflags);
 	list_add_tail_rcu(&r_srp->rq_entry, &fp->rq_list);
@@ -3543,6 +3740,7 @@ sg_add_sfp(struct sg_device *sdp)
 	assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
 	assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
 	assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
+	assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm, SG_DEF_TIME_UNIT);
 	assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
@@ -3979,7 +4177,8 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 
 /* Writes debug info for one sg_request in obp buffer */
 static int
-sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
+sg_proc_dbg_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
+		 int len)
 	__must_hold(&srp->req_lck)
 {
 	bool is_v3v4, v4, is_dur;
@@ -3987,6 +4186,7 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
 	u32 dur;
 	enum sg_rq_state rq_st;
 	const char *cp;
+	const char *tp = t_in_ns ? "ns" : "ms";
 
 	if (len < 1)
 		return 0;
@@ -3999,15 +4199,15 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, char *obp, int len)
 		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
 				"     dio>> " : "     ";
 	rq_st = atomic_read(&srp->rq_st);
-	dur = sg_get_dur(srp, &rq_st, &is_dur);
+	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
 	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
 		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
 		       srp->sgat_h.buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
-		n += scnprintf(obp + n, len - n, " dur=%ums", dur);
+		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
-		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%ums",
-			       to / 1000, dur);
+		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
+			       to / 1000, dur, tp);
 	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
 		       srp->sgat_h.num_sgat, srp->cmd_opcode);
 	return n;
@@ -4045,7 +4245,9 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
 		       atomic_read(&fp->waiting), fp->tid);
 	list_for_each_entry_rcu(srp, &fp->rq_list, rq_entry) {
 		spin_lock(&srp->req_lck);
-		n += sg_proc_dbg_sreq(srp, fp->timeout, obp + n, len - n);
+		n += sg_proc_dbg_sreq(srp, fp->timeout,
+				      test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm),
+				      obp + n, len - n);
 		spin_unlock(&srp->req_lck);
 	}
 	if (list_empty(&fp->rq_list))
@@ -4057,7 +4259,9 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
 			first_fl = false;
 		}
 		spin_lock(&srp->req_lck);
-		n += sg_proc_dbg_sreq(srp, fp->timeout, obp + n, len - n);
+		n += sg_proc_dbg_sreq(srp, fp->timeout,
+				      test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm),
+				      obp + n, len - n);
 		spin_unlock(&srp->req_lck);
 	}
 	return n;
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 8181e36442a8..ca2b4819ddcd 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -154,6 +154,70 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	int unused;
 } sg_req_info_t;
 
+/*
+ * The following defines are for manipulating struct sg_extended_info which
+ * is abbreviated to "SEI". A following "M" (i.e. "_SEIM_") indicates a
+ * mask. Most mask values correspond to a integer (usually a uint32_t) apart
+ * from SG_SEIM_CTL_FLAGS which is for boolean values packed into an integer.
+ * The mask values for those booleans start with "SG_CTL_FLAGM_". The scope
+ * of these settings, like most other ioctls, is usually that of the file
+ * descriptor the ioctl is executed on. The "rd:" indication means read-only,
+ * attempts to write to them are ignored. "rd>" means action when reading.
+ */
+#define SG_SEIM_CTL_FLAGS	0x1	/* ctl_flags_mask bits in ctl_flags */
+#define SG_SEIM_READ_VAL	0x2	/* write SG_SEIRV_*, read back value */
+#define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
+#define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
+#define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
+#define SG_SEIM_ALL_BITS	0xff	/* should be OR of previous items */
+
+/* flag and mask values for boolean fields follow */
+#define SG_CTL_FLAGM_TIME_IN_NS	0x1	/* time: nanosecs (def: millisecs) */
+#define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
+#define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
+#define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
+#define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
+
+/* Write one of the following values to sg_extended_info::read_value, get... */
+#define SG_SEIRV_INT_MASK	0x0	/* get SG_SEIM_ALL_BITS */
+#define SG_SEIRV_BOOL_MASK	0x1	/* get SG_CTL_FLAGM_ALL_BITS */
+#define SG_SEIRV_VERS_NUM	0x2	/* get driver version number as int */
+#define SG_SEIRV_FL_RQS		0x3	/* number of requests in free list */
+#define SG_SEIRV_DEV_FL_RQS	0x4	/* sum(fl rqs) on all dev's fds */
+#define SG_SEIRV_SUBMITTED	0x5	/* number of mrqs submitted+unread */
+#define SG_SEIRV_DEV_SUBMITTED	0x6	/* sum(submitted) on all dev's fds */
+
+/*
+ * A pointer to the following structure is passed as the third argument to
+ * ioctl(SG_SET_GET_EXTENDED). Each bit in the *_wr_mask fields causes the
+ * corresponding integer (e.g. reserved_sz) or bit (e.g. the
+ * SG_CTL_FLAG_TIME_IN_NS bit in ctl_flags) to be read from the user space
+ * and modify the driver. Each bit in the *_rd_mask fields causes the
+ * corresponding integer or bit to be fetched from the driver and written
+ * back to the user space. If the same bit is set in both the *_wr_mask and
+ * corresponding *_rd_mask fields, then which one comes first depends on the
+ * setting but no other operation will split the two. This structure is
+ * padded to 96 bytes to allow for new values to be added in the future.
+ */
+
+/* If both sei_wr_mask and sei_rd_mask are 0, this ioctl does nothing */
+struct sg_extended_info {
+	__u32	sei_wr_mask;	/* OR-ed SG_SEIM_* user->driver values */
+	__u32	sei_rd_mask;	/* OR-ed SG_SEIM_* driver->user values */
+	__u32	ctl_flags_wr_mask;	/* OR-ed SG_CTL_FLAGM_* values */
+	__u32	ctl_flags_rd_mask;	/* OR-ed SG_CTL_FLAGM_* values */
+	__u32	ctl_flags;	/* bit values OR-ed, see SG_CTL_FLAGM_* */
+	__u32	read_value;	/* write SG_SEIRV_*, read back related */
+
+	__u32	reserved_sz;	/* data/sgl size of pre-allocated request */
+	__u32	tot_fd_thresh;	/* total data/sgat for this fd, 0: no limit */
+	__u32	minor_index;	/* rd: kernel's sg device minor number */
+	__u32	share_fd;	/* SHARE_FD and CHG_SHARE_FD use this */
+	__u32	sgat_elem_sz;	/* sgat element size (must be power of 2) */
+	__u8	pad_to_96[52];	/* pad so struct is 96 bytes long */
+};
+
 /*
  * IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
  * [Those that only apply to the SG 2.x drivers are at the end of the file.]
@@ -183,6 +247,9 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
  */
 #define SG_IOCTL_MAGIC_NUM 0x22
 
+#define SG_SET_GET_EXTENDED _IOWR(SG_IOCTL_MAGIC_NUM, 0x51,	\
+				  struct sg_extended_info)
+
 /* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
 #define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
 /* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */

From patchwork Fri May 24 18:48:03 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960335
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3358D1390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:39 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2515528A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:39 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 19C0728A28; Fri, 24 May 2019 18:48:39 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5AF8028A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403762AbfEXSsi (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:38 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56406 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391503AbfEXSsh (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:37 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 69CEC204199;
        Fri, 24 May 2019 20:48:35 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id MHdKQfX1G0HB; Fri, 24 May 2019 20:48:33 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id CFBBD204165;
        Fri, 24 May 2019 20:48:26 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 13/19] sg: sgat_elem_sz and sum_fd_dlens
Date: Fri, 24 May 2019 14:48:03 -0400
Message-Id: <20190524184809.25121-14-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Wire up some more capabilities of ioctl(SG_SET_GET_EXTENDED). One
is the size of each internal scatter gather list element. This
defaults to 2^15 and was fixed in previous versions of this
driver. If the user provides a value, it must be a power of
2 (bytes) and no less than PAGE_SIZE.

sum_fd_dlens provides user control over a mechanism designed to
stop the starvation of the host machine's memory. Since requests
per file descriptor are no longer limited to 16, thousands could
be queued up by a badly designed program. If each one requests
a large buffer (say 128 KB each for READs) then without this
mechanism, the OOM killer may be called on to save the machine.
The driver counts the cumulative size of data buffers
outstanding held by each file descriptor. Once that figure
exceeds a default size of 16 MB, further submissions on that
file descriptor are failed with E2BIG.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 drivers/scsi/sg.c      | 64 +++++++++++++++++++++++++++++++++++++++---
 include/uapi/scsi/sg.h |  1 +
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c014fb24eca1..64e9de67ccd4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -88,7 +88,11 @@ enum sg_rq_state {
 	SG_RS_BUSY,		/* temporary state should rarely be seen */
 };
 
+/* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
+#define SG_TOT_FD_THRESHOLD (32 * 1024 * 1024)
+
 #define SG_TIME_UNIT_MS 0	/* milliseconds */
+/* #define SG_TIME_UNIT_NS 1	   nanoseconds */
 #define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
 #define SG_FD_Q_AT_HEAD 0
@@ -223,7 +227,9 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct list_head rq_fl; /* head of sg_request free list */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
+	int tot_fd_thresh;      /* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
+	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
 	atomic_t submitted;	/* number inflight or awaiting read */
 	atomic_t waiting;	/* number of requests awaiting read */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
@@ -1939,8 +1945,8 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 {
 	int result = 0;
 	int ret = 0;
-	int s_wr_mask, s_rd_mask;
-	u32 or_masks;
+	int n, j, s_wr_mask, s_rd_mask;
+	u32 uv, or_masks;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_extended_info *seip;
 	struct sg_extended_info sei;
@@ -1957,6 +1963,17 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	SG_LOG(3, sdp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask,
 	       s_rd_mask);
+	/* tot_fd_thresh (u32), [raw] [sum of active cmd dlen_s] */
+	if (or_masks & SG_SEIM_TOT_FD_THRESH) {
+		if (s_wr_mask & SG_SEIM_TOT_FD_THRESH) {
+			uv = seip->tot_fd_thresh;
+			if (uv > 0 && uv < PAGE_SIZE)
+				uv = PAGE_SIZE;
+			sfp->tot_fd_thresh = uv;
+		}
+		if (s_rd_mask & SG_SEIM_TOT_FD_THRESH)
+			seip->tot_fd_thresh = sfp->tot_fd_thresh;
+	}
 	/* check all boolean flags for either wr or rd mask set in or_mask */
 	if (or_masks & SG_SEIM_CTL_FLAGS)
 		sg_extended_bool_flags(sfp, seip);
@@ -1971,6 +1988,23 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
 		sg_extended_read_value(sfp, seip);
+	/* override scatter gather element size [rbw] (def: SG_SCATTER_SZ) */
+	if (or_masks & SG_SEIM_SGAT_ELEM_SZ) {
+		n = sfp->sgat_elem_sz;
+		if (s_wr_mask & SG_SEIM_SGAT_ELEM_SZ) {
+			j = (int)seip->sgat_elem_sz;
+			if (j != (1 << ilog2(j)) || j < (int)PAGE_SIZE) {
+				SG_LOG(1, sdp, "%s: %s not power of 2, %s\n",
+				       __func__, "sgat element size",
+				       "or less than PAGE_SIZE");
+				ret = -EINVAL;
+			} else {
+				sfp->sgat_elem_sz = j;
+			}
+		}
+		if (s_rd_mask & SG_SEIM_SGAT_ELEM_SZ)
+			seip->sgat_elem_sz = n; /* prior value if rw */
+	}
 	/* reserved_sz [raw], since may be reduced by other limits */
 	if (s_wr_mask & SG_SEIM_RESERVED_SIZE) {
 		mutex_lock(&sfp->f_mutex);
@@ -3289,6 +3323,8 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	if (unlikely(rem_sz > 0))       /* must have failed */
 		return -ENOMEM;
 	schp->buflen = align_sz;
+	if (sfp->tot_fd_thresh > 0)
+		atomic_add(align_sz, &sfp->sum_fd_dlens);
 	return 0;
 err_out:
 	for (j = 0; j < k; ++j)
@@ -3335,6 +3371,15 @@ sg_remove_sgat(struct sg_request *srp)
 	if (!test_bit(SG_FRQ_DIO_IN_USE, srp->frq_bm))
 		sg_remove_sgat_helper(sdp, schp);
 
+	if (sfp->tot_fd_thresh > 0) {
+		/* this is a subtraction, error if it goes negative */
+		if (atomic_add_negative(-schp->buflen, &sfp->sum_fd_dlens)) {
+			SG_LOG(2, sfp->parentdp,
+			       "%s: logic error: this dlen > %s\n",
+			       __func__, "sum_fd_dlens");
+			atomic_set(&sfp->sum_fd_dlens, 0);
+		}
+	}
 	memset(schp, 0, sizeof(*schp));         /* zeros buflen and dlen */
 }
 
@@ -3560,6 +3605,7 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 {
 	bool act_empty = false;
 	bool mk_new_srp = true;
+	u32 sum_dlen;
 	unsigned long iflags;
 	enum sg_rq_state sr_st;
 	struct sg_fd *fp = cwrp->sfp;
@@ -3613,6 +3659,13 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 			r_srp = ERR_PTR(-EDOM);
 			SG_LOG(6, sdp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
+		} else if (fp->tot_fd_thresh > 0) {
+			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dxfr_len;
+			if (sum_dlen > (u32)fp->tot_fd_thresh) {
+				r_srp = ERR_PTR(-E2BIG);
+				SG_LOG(2, sdp, "%s: sum_of_dlen(%u) > %s\n",
+				       __func__, sum_dlen, "tot_fd_thresh");
+			}
 		}
 		spin_unlock_irqrestore(&fp->rq_list_lock, iflags);
 		if (IS_ERR(r_srp))        /* NULL is not an ERR here */
@@ -3742,6 +3795,8 @@ sg_add_sfp(struct sg_device *sdp)
 	assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm, SG_DEF_TIME_UNIT);
 	assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
+	sfp->tot_fd_thresh = SG_TOT_FD_THRESHOLD;
+	atomic_set(&sfp->sum_fd_dlens, 0);
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
 	/*
@@ -4237,8 +4292,9 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
 		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
 		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
 		       fp->ffd_bm[0]);
-	n += scnprintf(obp + n, len - n, "   mmap_called=%d\n",
-		       test_bit(SG_FFD_MMAP_CALLED, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n, "   mmap_called=%d sum_fd_dlens=%u\n",
+		       test_bit(SG_FFD_MMAP_CALLED, fp->ffd_bm),
+		       atomic_read(&fp->sum_fd_dlens));
 	n += scnprintf(obp + n, len - n,
 		       "   submitted=%d waiting=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index ca2b4819ddcd..378cf0532756 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -167,6 +167,7 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIM_CTL_FLAGS	0x1	/* ctl_flags_mask bits in ctl_flags */
 #define SG_SEIM_READ_VAL	0x2	/* write SG_SEIRV_*, read back value */
 #define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
+#define SG_SEIM_TOT_FD_THRESH	0x8	/* tot_fd_thresh of data buffers */
 #define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
 #define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
 #define SG_SEIM_ALL_BITS	0xff	/* should be OR of previous items */

From patchwork Fri May 24 18:48:04 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960343
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A86F815A6
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:42 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9941B28A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:42 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9788B28A24; Fri, 24 May 2019 18:48:42 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC06528A26
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403792AbfEXSsk (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:40 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56406 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2403773AbfEXSsk (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:40 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id ADF7D204170;
        Fri, 24 May 2019 20:48:38 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id yZaMq-yW3Ioz; Fri, 24 May 2019 20:48:35 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 050FB20419A;
        Fri, 24 May 2019 20:48:27 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 14/19] sg: tag and more_async
Date: Fri, 24 May 2019 14:48:04 -0400
Message-Id: <20190524184809.25121-15-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Wire up some more capabilities of ioctl(SG_SET_GET_EXTENDED). One
is to use a LLD or block layer generated tag rather than the user
provided pack_id to track requests. Tags to the user space for
an async interface may be considered as work in progress as there
doesn't seem to be a safe mechanism to get that tag. When this
driver fetches that tag it could be too early (so it should get
a default value) or too late (i.e. the request has completed, at
least internally). Well, pack_id has been in the sg since version
1 of the driver and works well ...

The more_async flag when set instructs the blk_get_request() not
to block which is does in the current driver on rare occasions
for some obscure reason. IMO it should not block in async mode
but there is backward compatibilty to consider.

Note to kernel janitors: please do not change setting like
GFP_ATOMIC to GFP_KERNEL in async code paths unless you
really understand the implications to user space programs.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 278 +++++++++++++++++++++++++++++------------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 199 insertions(+), 82 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 64e9de67ccd4..6a26aa483d8e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -44,6 +44,7 @@ static char *sg_version_date = "20190520";
 #include <linux/cred.h>		/* for sg_check_file_access() */
 #include <linux/bsg.h>
 #include <linux/proc_fs.h>	/* used if CONFIG_SCSI_PROC_FS */
+#include <linux/rculist.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_eh.h>
@@ -106,6 +107,7 @@ enum sg_rq_state {
 #define SG_SHARE_FD_MASTER (-2)		/* values >= 0 imply FD_SLAVE */
 
 #define SG_PACK_ID_WILDCARD (-1)
+#define SG_TAG_WILDCARD (-1)
 
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
 
@@ -126,7 +128,9 @@ enum sg_rq_state {
 #define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+#define SG_FFD_PREFER_TAG	7	/* prefer tag over pack_id (def) */
 #define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
+#define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
 
 
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
@@ -199,15 +203,16 @@ struct sg_request {	/* active SCSI command or inactive on free list (fl) */
 		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
 		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
 	};
-	u32 duration;		/* cmd duration in milliseconds */
-	u32 rq_flags;		/* hold user supplied flags */
+	u32 duration;		/* cmd duration in milli or nano seconds */
+	u32 rq_flags;		/* flags given in v3 and v4 */
 	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
 	u32 rq_result;		/* packed scsi request result from LLD */
 	int in_resid;		/* requested-actual byte count on data-in */
-	int pack_id;		/* user provided packet identifier field */
+	int pack_id;		/* v3 pack_id or in v4 request_extra field */
 	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
+	int tag;		/* block layer identifier of request */
 	u64 start_ns;		/* starting point of command duration calc */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	u8 *sense_bp;		/* alloc-ed sense buffer, as needed */
@@ -227,7 +232,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct list_head rq_fl; /* head of sg_request free list */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
-	int tot_fd_thresh;      /* E2BIG if sum_of(dlen) > this, 0: ignore */
+	int tot_fd_thresh;	/* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
 	atomic_t submitted;	/* number inflight or awaiting read */
@@ -290,7 +295,8 @@ static int sg_rd_append(struct sg_request *srp, void __user *outp,
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
+					    bool is_tag);
 static struct sg_request *sg_add_request(struct sg_comm_wr_t *cwrp,
 					 int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
@@ -391,8 +397,8 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 				(sdp->open_wait,
-				 (unlikely(atomic_read(&sdp->detaching)) ||
-				 !atomic_read(&sdp->open_cnt)));
+				 unlikely(atomic_read(&sdp->detaching)) ||
+				 !atomic_read(&sdp->open_cnt));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (unlikely(res)) /* -ERESTARTSYS */
@@ -801,6 +807,11 @@ sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
 		return PTR_ERR(srp);
 	if (o_srp)
 		*o_srp = srp;
+	if (p && !sync && (srp->rq_flags & SGV4_FLAG_YIELD_TAG)) {
+		struct sg_io_v4 __user *h4_up = (struct sg_io_v4 __user *)p;
+
+		res = put_user(srp->tag, &h4_up->request_tag);
+	}
 	return res;
 }
 
@@ -826,7 +837,7 @@ static int
 sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 {
 	int res;
-	u8 hdr_store[SZ_SG_IO_V4];      /* max(v3interface, v4interface) */
+	u8 hdr_store[SZ_SG_IO_V4];	/* max(v3interface, v4interface) */
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
@@ -844,6 +855,7 @@ static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
 	bool at_head, is_v4h, sync;
+	unsigned long iflags;
 	struct sg_device *sdp = sfp->parentdp;
 
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
@@ -870,6 +882,17 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		atomic_inc(&sfp->submitted);
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, (int)at_head, sg_rq_end_io);
+
+	/* Should we try to find the req's tag? Only pay if answer is yes */
+	if (is_v4h && (srp->rq_flags & SGV4_FLAG_YIELD_TAG)) {
+		spin_lock_irqsave(&srp->req_lck, iflags);
+		{		/* we might be too late or too early! */
+			struct request *rq = srp->rq;
+
+			srp->tag = rq ? rq->tag : SG_TAG_WILDCARD;
+		}
+		spin_unlock_irqrestore(&srp->req_lck, iflags);
+	}
 }
 
 static inline int
@@ -987,7 +1010,8 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
  * returns true (or an event like a signal (e.g. control-C) occurs).
  */
 static inline bool
-sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
+sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
+		 bool is_tag)
 {
 	struct sg_request *srp;
 
@@ -995,7 +1019,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
 		*srpp = NULL;
 		return true;
 	}
-	srp = sg_find_srp_by_id(sfp, pack_id);
+	srp = sg_find_srp_by_id(sfp, id, is_tag);
 	*srpp = srp;
 	return !!srp;
 }
@@ -1040,7 +1064,7 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 	return sb_len_wr;
 }
 
-#if IS_ENABLED(CONFIG_SCSI_LOGGING)
+#if IS_ENABLED(SG_LOG_ACTIVE)
 static void
 sg_rep_rq_state_fail(struct sg_device *sdp, enum sg_rq_state exp_old_st,
 		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
@@ -1178,8 +1202,10 @@ static int
 sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 {
 	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool use_tag = false;
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
+	int tag = SG_TAG_WILDCARD;
 	u8 v4_holder[SZ_SG_IO_V4];
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)v4_holder;
 	struct sg_device *sdp = sfp->parentdp;
@@ -1198,8 +1224,10 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sdp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
-	id = pack_id;
-	srp = sg_find_srp_by_id(sfp, id);
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
+		use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
+	id = use_tag ? tag : pack_id;
+	srp = sg_find_srp_by_id(sfp, id, use_tag);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
@@ -1207,7 +1235,7 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 			return -EAGAIN;
 		res = wait_event_interruptible(sfp->read_wait,
 					       sg_get_ready_srp(sfp, &srp,
-								id));
+								id, use_tag));
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 		if (unlikely(res))
@@ -1250,7 +1278,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
 
-	srp = sg_find_srp_by_id(sfp, pack_id);
+	srp = sg_find_srp_by_id(sfp, pack_id, false);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
@@ -1258,7 +1286,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 			return -EAGAIN;
 		res = wait_event_interruptible
 				(sfp->read_wait,
-				 sg_get_ready_srp(sfp, &srp, pack_id));
+				 sg_get_ready_srp(sfp, &srp, pack_id, false));
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 		if (unlikely(res))
@@ -1414,15 +1442,15 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			want_id = h2p->pack_id;
 		}
 	}
-	srp = sg_find_srp_by_id(sfp, want_id);
+	srp = sg_find_srp_by_id(sfp, want_id, false);
 	if (!srp) {     /* nothing available so wait on packet to arrive or */
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
-		ret = wait_event_interruptible(sfp->read_wait,
-					       sg_get_ready_srp(sfp, &srp,
-								want_id));
+		ret = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, want_id, false));
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 		if (ret)	/* -ERESTARTSYS as signal hit process */
@@ -1555,10 +1583,10 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
-	rip->pack_id = srp->pack_id;
+	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
+				srp->tag : srp->pack_id;
 	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
 			uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
-	rip->usr_ptr = srp->s_hdr3.usr_ptr;
 	spin_unlock(&srp->req_lck);
 }
 
@@ -1660,8 +1688,9 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	return res;
 }
 
+/* When use_tag is true then id is a tag, else it is a pack_id. */
 static struct sg_request *
-sg_match_request(struct sg_fd *sfp, int id)
+sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 {
 	bool found = false;
 	struct sg_request *srp;
@@ -1671,10 +1700,19 @@ sg_match_request(struct sg_fd *sfp, int id)
 	if (id == SG_PACK_ID_WILDCARD)	/* both wildcards same value: -1 */
 		return list_first_entry_or_null(&sfp->rq_list,
 						struct sg_request, rq_entry);
-	list_for_each_entry(srp, &sfp->rq_list, rq_entry) {
-		if (id == srp->pack_id) {
-			found = true;
-			break;
+	if (use_tag) {
+		list_for_each_entry(srp, &sfp->rq_list, rq_entry) {
+			if (id == srp->tag) {
+				found = true;
+				break;
+			}
+		}
+	} else {
+		list_for_each_entry(srp, &sfp->rq_list, rq_entry) {
+			if (id == srp->pack_id) {
+				found = true;
+				break;
+			}
 		}
 	}
 	return found ? srp : NULL;
@@ -1683,7 +1721,8 @@ sg_match_request(struct sg_fd *sfp, int id)
 static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
-	int res, pack_id, id;
+	bool use_tag;
+	int res, pack_id, tag, id;
 	unsigned long iflags;
 	struct sg_fd *o_sfp;
 	struct sg_request *srp;
@@ -1695,10 +1734,12 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
 		return -EPERM;
 	pack_id = h4p->request_extra;
-	id = pack_id;
+	tag = h4p->request_tag;
+	use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
+	id = use_tag ? tag : pack_id;
 
 	spin_lock(&sfp->rq_list_lock);
-	srp = sg_match_request(sfp, id);
+	srp = sg_match_request(sfp, use_tag, id);
 	if (srp) {
 		spin_lock_irqsave(&srp->req_lck, iflags);
 		spin_unlock(&sfp->rq_list_lock);
@@ -1709,7 +1750,7 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 			if (o_sfp == sfp)
 				continue;	/* already checked */
 			spin_lock(&o_sfp->rq_list_lock);
-			srp = sg_match_request(o_sfp, id);
+			srp = sg_match_request(o_sfp, use_tag, id);
 			if (srp) {
 				spin_lock_irqsave(&srp->req_lck, iflags);
 				spin_unlock(&o_sfp->rq_list_lock);
@@ -1850,6 +1891,16 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			*c_flgsp &= ~SG_CTL_FLAGM_TIME_IN_NS;
 	}
+	/* TAG_FOR_PACK_ID boolean, [raw] search by tag or pack_id (def) */
+	if (c_flgs_wm & SG_CTL_FLAGM_TAG_FOR_PACK_ID)
+		assign_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm,
+			   !!(*c_flgsp & SG_CTL_FLAGM_TIME_IN_NS));
+	if (c_flgs_rm & SG_CTL_FLAGM_TAG_FOR_PACK_ID) {
+		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm))
+			*c_flgsp |= SG_CTL_FLAGM_TAG_FOR_PACK_ID;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_TAG_FOR_PACK_ID;
+	}
 	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
 	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
 		if (atomic_read(&sdp->open_cnt) > 1)
@@ -1879,6 +1930,18 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			*c_flgsp &= ~SG_CTL_FLAGM_NO_DURATION;
 	}
+	/* MORE_ASYNC boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_MORE_ASYNC)
+		flg = test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm);
+	if (c_flgs_wm & SG_CTL_FLAGM_MORE_ASYNC)
+		assign_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm,
+			   !!(*c_flgsp & SG_CTL_FLAGM_MORE_ASYNC));
+	if (c_flgs_rm & SG_CTL_FLAGM_MORE_ASYNC) {
+		if (flg)
+			*c_flgsp |= SG_CTL_FLAGM_MORE_ASYNC;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_MORE_ASYNC;
+	}
 }
 
 static void
@@ -2166,11 +2229,23 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		rcu_read_lock();
 		val = -1;
-		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
-			if (SG_RS_AWAIT_READ(srp) &&
-			    !test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
-				val = srp->pack_id;
-				break;
+		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
+			list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+				if (SG_RS_AWAIT_READ(srp) &&
+				    !test_bit(SG_FRQ_SYNC_INVOC,
+					      srp->frq_bm)) {
+					val = srp->tag;
+					break;
+				}
+			}
+		} else {
+			list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+				if (SG_RS_AWAIT_READ(srp) &&
+				    !test_bit(SG_FRQ_SYNC_INVOC,
+					      srp->frq_bm)) {
+					val = srp->pack_id;
+					break;
+				}
 			}
 		}
 		rcu_read_unlock();
@@ -2370,11 +2445,12 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sdp, "%s: cmd=0x%x\n", __func__, (int)cmd_in);
 	if (unlikely(!sdp))
 		return -ENXIO;
 
 	sdev = sdp->device;
-	if (sdev->host->hostt->compat_ioctl) { 
+	if (sdev->host->hostt->compat_ioctl) {
 		int ret;
 
 		ret = sdev->host->hostt->compat_ioctl(sdev, cmd_in, (void __user *)arg);
@@ -3059,6 +3135,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	int res = 0;
 	int dxfer_len = 0;
 	int r0w = READ;
+	int flgs;
 	unsigned int iov_count = 0;
 	void __user *up;
 	struct request *rq;
@@ -3108,17 +3185,15 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	q = sdp->device->request_queue;
 
 	/*
-	 * NOTE
-	 *
-	 * With scsi-mq enabled, there are a fixed number of preallocated
-	 * requests equal in number to shost->can_queue.  If all of the
-	 * preallocated requests are already in use, then blk_get_request()
-	 * will sleep until an active command completes, freeing up a request.
-	 * Although waiting in an asynchronous interface is less than ideal, we
-	 * do not want to use BLK_MQ_REQ_NOWAIT here because userspace might
-	 * not expect an EWOULDBLOCK from this condition.
+	 * For backward compatibility default to using blocking variant even
+	 * when in non-blocking (async) mode. If the SG_CTL_FLAGM_MORE_ASYNC
+	 * boolean set on this file descriptor, returns -EAGAIN if
+	 * blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
 	 */
-	rq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
+	flgs = (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm)) ?
+						BLK_MQ_REQ_NOWAIT : 0;
+	rq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN),
+			     flgs);
 	if (unlikely(IS_ERR(rq))) {
 		kfree(long_cmdp);
 		return PTR_ERR(rq);
@@ -3427,17 +3502,18 @@ sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
 
 /*
  * If there are multiple requests outstanding, the speed of this function is
- * important. SG_PACK_ID_WILDCARD is -1 and that case is typically
+ * important. 'id' is pack_id when is_tag=false, otherwise it is a tag. Both
+ * SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1 and that case is typically
  * the fast path. This function is only used in the non-blocking cases.
  * Returns pointer to (first) matching sg_request or NULL. If found,
  * sg_request state is moved from SG_RS_AWAIT_RD to SG_RS_BUSY.
  */
 static struct sg_request *
-sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
+sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 {
 	__maybe_unused bool is_bad_st = false;
 	__maybe_unused enum sg_rq_state bad_sr_st;
-	bool search_for_1 = (pack_id != SG_PACK_ID_WILDCARD);
+	bool search_for_1 = (id != SG_TAG_WILDCARD);
 	enum sg_rq_state sr_st;
 	int res;
 	struct sg_request *srp = NULL;
@@ -3447,8 +3523,13 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			if (srp->pack_id != pack_id)
-				continue;
+			if (is_tag) {
+				if (srp->tag != id)
+					continue;
+			} else {
+				if (srp->pack_id != id)
+					continue;
+			}
 			sr_st = atomic_read(&srp->rq_st);
 			switch (sr_st) {
 			case SG_RS_AWAIT_RD:
@@ -3488,16 +3569,17 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	rcu_read_unlock();
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
 		if (search_for_1) {
-			struct sg_device *sdp = sfp->parentdp;
-			const char *cptp = "pack_id=";
+			__maybe_unused struct sg_device *sdp = sfp->parentdp;
+			__maybe_unused const char *cptp = is_tag ? "tag=" :
+								   "pack_id=";
 
 			if (is_bad_st)
 				SG_LOG(1, sdp, "%s: %s%d wrong state: %s\n",
-				       __func__, cptp, pack_id,
+				       __func__, cptp, id,
 				       sg_rq_st_str(bad_sr_st, true));
 			else
 				SG_LOG(6, sdp, "%s: %s%d not awaiting read\n",
-				       __func__, cptp, pack_id);
+				       __func__, cptp, id);
 		}
 	}
 	return NULL;
@@ -3505,7 +3587,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	rcu_read_unlock();
 	if (search_for_1) {
 		SG_LOG(6, sfp->parentdp, "%s: %s%d found [srp=0x%p]\n",
-		       __func__, "pack_id=", pack_id, srp);
+		       __func__, (is_tag ? "tag=" : "pack_id="), id, srp);
 	}
 	return srp;
 }
@@ -3530,6 +3612,7 @@ sg_mk_srp(struct sg_fd *sfp, bool first)
 		spin_lock_init(&srp->req_lck);
 		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
 		srp->parentfp = sfp;
+		srp->tag = SG_TAG_WILDCARD;
 		return srp;
 	} else {
 		return ERR_PTR(-ENOMEM);
@@ -3731,7 +3814,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	srp->sense_bp = NULL;
 	atomic_set(&srp->rq_st, SG_RS_BUSY);
 	list_del_rcu(&srp->rq_entry);
-	kfree(sbp);     /* maybe orphaned req, thus never read */
+	kfree(sbp);	/* maybe orphaned req, thus never read */
 	/*
 	 * N.B. sg_request object is not de-allocated (freed). The contents
 	 * of the rq_list and rq_fl lists are de-allocated (freed) when
@@ -3881,7 +3964,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		list_del(&srp->rq_entry);
 		if (srp->sgat_h.buflen > 0)
 			sg_remove_sgat(srp);
-		kfree(srp->sense_bp);   /* abnormal close: device detached */
+		kfree(srp->sense_bp);	/* abnormal close: device detached */
 		SG_LOG(6, sdp, "%s:%s%p --\n", __func__, cp, srp);
 		kfree(srp);
 	}
@@ -3940,14 +4023,18 @@ sg_lookup_dev(int dev)
 	return idr_find(&sg_index_idr, dev);
 }
 
+/*
+ * Returns valid pointer to a sg_device object on success or a negated
+ * errno value on failure. Does not return NULL.
+ */
 static struct sg_device *
-sg_get_dev(int dev)
+sg_get_dev(int min_dev)
 {
 	struct sg_device *sdp;
-	unsigned long flags;
+	unsigned long iflags;
 
-	read_lock_irqsave(&sg_index_lock, flags);
-	sdp = sg_lookup_dev(dev);
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = sg_lookup_dev(min_dev);
 	if (!sdp)
 		sdp = ERR_PTR(-ENXIO);
 	else if (atomic_read(&sdp->detaching)) {
@@ -3957,8 +4044,7 @@ sg_get_dev(int dev)
 		sdp = ERR_PTR(-ENODEV);
 	} else
 		kref_get(&sdp->d_ref);
-	read_unlock_irqrestore(&sg_index_lock, flags);
-
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return sdp;
 }
 
@@ -3994,7 +4080,7 @@ static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
 static int sg_proc_single_open_adio(struct inode *inode, struct file *filp);
 static ssize_t sg_proc_write_adio(struct file *filp, const char __user *buffer,
-			          size_t count, loff_t *off);
+				  size_t count, loff_t *off);
 static const struct file_operations adio_fops = {
 	.owner = THIS_MODULE,
 	.open = sg_proc_single_open_adio,
@@ -4037,12 +4123,20 @@ static const struct seq_operations devstrs_seq_ops = {
 	.show  = sg_proc_seq_show_devstrs,
 };
 
-static int sg_proc_seq_show_dbg(struct seq_file *s, void *v);
-static const struct seq_operations dbg_seq_ops = {
+static int sg_proc_seq_show_full_dbg(struct seq_file *s, void *v);
+static const struct seq_operations full_dbg_seq_ops = {
 	.start = dev_seq_start,
 	.next  = dev_seq_next,
 	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_dbg,
+	.show  = sg_proc_seq_show_full_dbg,
+};
+
+static int sg_proc_seq_show_red_dbg(struct seq_file *s, void *v);
+static const struct seq_operations red_dbg_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_red_dbg,
 };
 
 static int
@@ -4055,11 +4149,12 @@ sg_proc_init(void)
 		return 1;
 
 	proc_create("allow_dio", 0644, p, &adio_fops);
-	proc_create_seq("debug", 0444, p, &dbg_seq_ops);
+	proc_create_seq("debug", 0444, p, &full_dbg_seq_ops);
 	proc_create("def_reserved_size", 0644, p, &dressz_fops);
 	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
 	proc_create_seq("devices", 0444, p, &dev_seq_ops);
 	proc_create_seq("device_strs", 0444, p, &devstrs_seq_ops);
+	proc_create_seq("red_debug", 0444, p, &red_dbg_seq_ops);
 	proc_create_single("version", 0444, p, sg_proc_seq_show_version);
 	return 0;
 }
@@ -4089,7 +4184,7 @@ sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 	return single_open(filp, sg_proc_seq_show_int, &sg_allow_dio);
 }
 
-static ssize_t 
+static ssize_t
 sg_proc_write_adio(struct file *filp, const char __user *buffer,
 		   size_t count, loff_t *off)
 {
@@ -4111,7 +4206,7 @@ sg_proc_single_open_dressz(struct inode *inode, struct file *filp)
 	return single_open(filp, sg_proc_seq_show_int, &sg_big_buff);
 }
 
-static ssize_t 
+static ssize_t
 sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 		     size_t count, loff_t *off)
 {
@@ -4174,7 +4269,7 @@ dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 	struct sg_proc_deviter *it = s->private;
 
 	*pos = ++it->index;
-	return (it->index < it->max) ? it : NULL;
+	return (it->index < (int)it->max) ? it : NULL;
 }
 
 static void
@@ -4193,8 +4288,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if ((NULL == sdp) || (NULL == sdp->device) ||
-	    (atomic_read(&sdp->detaching)))
+	if (!sdp || !sdp->device || atomic_read(&sdp->detaching))
 		seq_puts(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
 	else {
 		scsidp = sdp->device;
@@ -4260,9 +4354,14 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 		       srp->sgat_h.buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
-	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
+	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
+		if ((srp->rq_flags & SGV4_FLAG_YIELD_TAG) &&
+		    srp->tag != SG_TAG_WILDCARD)
+			n += scnprintf(obp + n, len - n, " tag=0x%x",
+				       srp->tag);
 		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
 			       to / 1000, dur, tp);
+	}
 	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
 		       srp->sgat_h.num_sgat, srp->cmd_opcode);
 	return n;
@@ -4270,7 +4369,7 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 
 /* Writes debug info for one sg fd (including its sg requests) in obp buffer */
 static int
-sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
+sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len, bool reduced)
 	__must_hold(&sfp->rq_list_lock)
 {
 	bool first_fl;
@@ -4299,6 +4398,8 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
 		       "   submitted=%d waiting=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
 		       atomic_read(&fp->waiting), fp->tid);
+	if (reduced)
+		return n;
 	list_for_each_entry_rcu(srp, &fp->rq_list, rq_entry) {
 		spin_lock(&srp->req_lck);
 		n += sg_proc_dbg_sreq(srp, fp->timeout,
@@ -4325,7 +4426,8 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len)
 
 /* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
-sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
+sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len,
+		 int *fd_counterp, bool reduced)
 	__must_hold(&sdp->sfd_llock)
 {
 	int n = 0;
@@ -4348,7 +4450,7 @@ sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
 		++*countp;
 		rcu_read_lock(); /* assume irqs disabled */
 		n += scnprintf(obp + n, len - n, "  FD(%d): ", *countp);
-		n += sg_proc_dbg_fd(fp, obp + n, len - n);
+		n += sg_proc_dbg_fd(fp, obp + n, len - n, reduced);
 		rcu_read_unlock();
 	}
 	return n;
@@ -4356,7 +4458,7 @@ sg_proc_dbg_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
 
 /* Called via dbg_seq_ops once for each sg device */
 static int
-sg_proc_seq_show_dbg(struct seq_file *s, void *v)
+sg_proc_seq_show_dbg(struct seq_file *s, void *v, bool reduced)
 {
 	bool found = false;
 	bool trunc = false;
@@ -4394,7 +4496,7 @@ sg_proc_seq_show_dbg(struct seq_file *s, void *v)
 			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
 				 disk_name, "detaching pending close\n");
 		else if (sdp->device) {
-			n = sg_proc_dbg_sdev(sdp, bp, bp_len, fdi_p);
+			n = sg_proc_dbg_sdev(sdp, bp, bp_len, fdi_p, reduced);
 			if (n >= bp_len - 1) {
 				trunc = true;
 				if (bp[bp_len - 2] != '\n')
@@ -4427,7 +4529,19 @@ sg_proc_seq_show_dbg(struct seq_file *s, void *v)
 	return 0;
 }
 
-#endif			/* CONFIG_SCSI_PROC_FS (~600 lines back) */
+static int
+sg_proc_seq_show_full_dbg(struct seq_file *s, void *v)
+{
+	return sg_proc_seq_show_dbg(s, v, false);
+}
+
+static int
+sg_proc_seq_show_red_dbg(struct seq_file *s, void *v)
+{
+	return sg_proc_seq_show_dbg(s, v, true);
+}
+
+#endif				/* CONFIG_SCSI_PROC_FS (~600 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 378cf0532756..5223ba33fb8d 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -106,6 +106,7 @@ typedef struct sg_io_hdr {
  */
 #define SGV4_FLAG_DIRECT_IO SG_FLAG_DIRECT_IO
 #define SGV4_FLAG_MMAP_IO SG_FLAG_MMAP_IO
+#define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::request_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
@@ -174,10 +175,12 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 
 /* flag and mask values for boolean fields follow */
 #define SG_CTL_FLAGM_TIME_IN_NS	0x1	/* time: nanosecs (def: millisecs) */
+#define SG_CTL_FLAGM_TAG_FOR_PACK_ID 0x2 /* prefer tag over pack_id (def) */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
+#define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */

From patchwork Fri May 24 18:48:05 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960345
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E2CA933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:44 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C9AE28A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:44 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0F96F28A24; Fri, 24 May 2019 18:48:44 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 329C228A29
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391625AbfEXSsl (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:41 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56423 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391503AbfEXSsl (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:41 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 3892420417E;
        Fri, 24 May 2019 20:48:39 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id Zyyx34FQS97d; Fri, 24 May 2019 20:48:36 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 2F3F2204162;
        Fri, 24 May 2019 20:48:29 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 15/19] sg: add fd sharing , change, unshare
Date: Fri, 24 May 2019 14:48:05 -0400
Message-Id: <20190524184809.25121-16-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add the ability establish a share between any two open file
descriptors in the sg driver. Neither file descriptor can
already be part of a share. This fd share is used for two
features added and described in later patches: request
sharing and the "do_on_other" flag used when multiple
requests are issued (with a single invocation from the
user space). See the webpage at:
http://sg.danny.cz/sg/sg_v40.html
in the section titled: "5 Sharing file descriptors".

Usually two file descriptors are enough. To support the
ability to READ once and then WRITE to two or more
file descriptors (hence potentially to write the same
data to different disks) the ability to drop the
share partner file descriptor and replace it with a
new fd is also available.

Finally a share can explicitly be undone, or unshared,
by either side. In practice, close()ing either side of
a fd share has the same effect (i.e. to unshare) so
this route is the more common.

File shares maybe within a single-threaded process,
between threads in the same process, or even between
processes (on the same machine) by passing an open
file descriptor via Unix sockets to the other process.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 491 +++++++++++++++++++++++++++++++++--------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 404 insertions(+), 90 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 6a26aa483d8e..44f09c65e0b9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -33,6 +33,7 @@ static char *sg_version_date = "20190520";
 #include <linux/moduleparam.h>
 #include <linux/cdev.h>
 #include <linux/idr.h>
+#include <linux/file.h>		/* for fget() and fput() */
 #include <linux/seq_file.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
@@ -43,6 +44,7 @@ static char *sg_version_date = "20190520";
 #include <linux/uio.h>
 #include <linux/cred.h>		/* for sg_check_file_access() */
 #include <linux/bsg.h>
+#include <linux/timekeeping.h>
 #include <linux/proc_fs.h>	/* used if CONFIG_SCSI_PROC_FS */
 #include <linux/rculist.h>
 
@@ -129,6 +131,7 @@ enum sg_rq_state {
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
 #define SG_FFD_PREFER_TAG	7	/* prefer tag over pack_id (def) */
+#define SG_FFD_RELEASE		8	/* release (close) underway */
 #define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
 #define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
 
@@ -234,13 +237,16 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
 	int tot_fd_thresh;	/* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
+	int shr_fd;		/* init: SG_SHARE_FD_UNUSED; master: -2 */
 	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
 	atomic_t submitted;	/* number inflight or awaiting read */
 	atomic_t waiting;	/* number of requests awaiting read */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
 	pid_t tid;		/* thread id when opened */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
+	struct file *filp;	/* my identity when sharing */
 	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct sg_fd *share_sfp;/* master+slave shares set this, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct kref f_ref;
 	struct execute_work ew_fd;  /* harvest all fd resources and lists */
@@ -273,7 +279,6 @@ struct sg_comm_wr_t {	/* arguments to sg_common_write() */
 		struct sg_io_v4 *h4p;
 	};
 	struct sg_fd *sfp;
-	struct file *filp;
 	const u8 __user *u_cmdp;
 	const u8 *cmdp;
 };
@@ -286,14 +291,15 @@ static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
 			int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
-static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
-			struct sg_io_hdr *hp, bool sync,
+static int sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp,
+			 void __user *p);
+static int sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
 static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
-static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
+static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
@@ -514,7 +520,7 @@ sg_open(struct inode *inode, struct file *filp)
 		/* Next done in sg_alloc(), repeat here to pick up change? */
 		sg_calc_sgat_param(sdp);
 	}
-	sfp = sg_add_sfp(sdp);
+	sfp = sg_add_sfp(sdp, filp);
 	if (IS_ERR(sfp)) {
 		res = PTR_ERR(sfp);
 		goto out_undo;
@@ -563,6 +569,7 @@ sg_release(struct inode *inode, struct file *filp)
 	       atomic_read(&sdp->open_cnt));
 	if (unlikely(!sdp))
 		return -ENXIO;
+	set_bit(SG_FFD_RELEASE, sfp->ffd_bm);
 
 	mutex_lock(&sdp->open_rel_lock);
 	scsi_autopm_put_device(sdp->device);
@@ -634,7 +641,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 		if (__copy_from_user(h3u8p + SZ_SG_HEADER, p + SZ_SG_HEADER,
 				     SZ_SG_IO_HDR - SZ_SG_HEADER))
 			return -EFAULT;
-		res = sg_v3_submit(filp, sfp, h3p, false, NULL);
+		res = sg_v3_submit(sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
 	}
 	/* v1 and v2 interfaces processed below this point */
@@ -698,7 +705,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	cwr.h3p = h3p;
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = p;
 	cwr.cmdp = NULL;
@@ -707,19 +713,18 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 }
 
 static int
-sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
-	      int len, u8 *cdbp)
+sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 {
 	if (!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE)
 		return -EMSGSIZE;
 	if (copy_from_user(cdbp, u_cdbp, len))
 		return -EFAULT;
-	if (O_RDWR != (filp->f_flags & O_ACCMODE)) {	/* read-only */
+	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) { /* read-only */
 		switch (sfp->parentdp->device->type) {
 		case TYPE_DISK:
 		case TYPE_RBC:
 		case TYPE_ZBC:
-			return blk_verify_command(cdbp, filp->f_mode);
+			return blk_verify_command(cdbp, sfp->filp->f_mode);
 		default:	/* SSC, SES, etc cbd_s may differ from SBC */
 			break;
 		}
@@ -728,8 +733,8 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 }
 
 static int
-sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
-	     bool sync, struct sg_request **o_srp)
+sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
+	     struct sg_request **o_srp)
 {
 	unsigned long ul_timeout;
 	struct sg_request *srp;
@@ -752,7 +757,6 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	cwr.h3p = hp;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = hp->cmd_len;
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = hp->cmdp;
 	cwr.cmdp = NULL;
@@ -765,8 +769,8 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 }
 
 static int
-sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
-	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
+sg_v4_submit(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
+	     bool sync, struct sg_request **o_srp)
 {
 	int res = 0;
 	unsigned long ul_timeout;
@@ -774,7 +778,6 @@ sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
 	struct sg_comm_wr_t cwr;
 
 	memset(&cwr, 0, sizeof(cwr));
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	cwr.h4p = h4p;
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
@@ -816,38 +819,38 @@ sg_v4_submit(struct file *filp, struct sg_fd *sfp, void __user *p,
 }
 
 static int
-sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_iosubmit(struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	u8 hdr_store[SZ_SG_IO_V4];
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
 	if (unlikely(res))
 		return res;
 	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
 		return -EFAULT;
 	if (h4p->guard == 'Q')
-		return sg_v4_submit(filp, sfp, p, h4p, false, NULL);
+		return sg_v4_submit(sfp, p, h4p, false, NULL);
 	return -EPERM;
 }
 
 static int
-sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	u8 hdr_store[SZ_SG_IO_V4];	/* max(v3interface, v4interface) */
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
 	if (unlikely(res))
 		return res;
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
 		return -EFAULT;
 	if (h3p->interface_id == 'S')
-		return sg_v3_submit(filp, sfp, h3p, false, NULL);
+		return sg_v3_submit(sfp, h3p, false, NULL);
 	return -EPERM;
 }
 
@@ -1107,43 +1110,6 @@ sg_rec_v3v4_state(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	return err;
 }
 
-/*
- * Completes a v3 request/command. Called from sg_read {v2 or v3},
- * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
- * completing a v3 request/command.
- */
-static int
-sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
-{
-	int err, err2;
-	int rq_result = srp->rq_result;
-	struct sg_io_hdr hdr3;
-	struct sg_io_hdr *hp = &hdr3;
-
-	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
-	err = sg_rec_v3v4_state(sfp, srp, false);
-	memset(hp, 0, sizeof(*hp));
-	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
-	hp->sb_len_wr = srp->sense_len;
-	hp->info = srp->rq_info;
-	hp->resid = srp->in_resid;
-	hp->duration = srp->duration;
-	hp->status = rq_result & 0xff;
-	hp->masked_status = status_byte(rq_result);
-	hp->msg_status = msg_byte(rq_result);
-	hp->host_status = host_byte(rq_result);
-	hp->driver_status = driver_byte(rq_result);
-	/* copy_*_user() [without leading underscores] checks access */
-	if (copy_to_user(p, hp, SZ_SG_IO_HDR))
-		err = err ? err : -EFAULT;
-	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
-	if (err2)
-		err = err ? err : err2;
-	sg_finish_scsi_blk_rq(srp);
-	sg_deact_request(sfp, srp);
-	return err ? err : 0;
-}
-
 static int
 sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	      struct sg_io_v4 *h4p)
@@ -1199,9 +1165,9 @@ sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
  * otherwise it waits (i.e. it "blocks").
  */
 static int
-sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
 	bool use_tag = false;
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
@@ -1252,9 +1218,9 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
  * otherwise it waits.
  */
 static int
-sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
 	int res;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u8 v3_holder[SZ_SG_IO_HDR];
@@ -1469,6 +1435,43 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	return ret < 0 ? ret : (int)count;
 }
 
+/*
+ * Completes a v3 request/command. Called from sg_read {v2 or v3},
+ * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
+ * completing a v3 request/command.
+ */
+static int
+sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
+{
+	int err, err2;
+	int rq_result = srp->rq_result;
+	struct sg_io_hdr hdr3;
+	struct sg_io_hdr *hp = &hdr3;
+
+	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
+	err = sg_rec_v3v4_state(sfp, srp, false);
+	memset(hp, 0, sizeof(*hp));
+	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
+	hp->sb_len_wr = srp->sense_len;
+	hp->info = srp->rq_info;
+	hp->resid = srp->in_resid;
+	hp->duration = srp->duration;
+	hp->status = rq_result & 0xff;
+	hp->masked_status = status_byte(rq_result);
+	hp->msg_status = msg_byte(rq_result);
+	hp->host_status = host_byte(rq_result);
+	hp->driver_status = driver_byte(rq_result);
+	/* copy_*_user() [without leading underscores] checks access */
+	if (copy_to_user(p, hp, SZ_SG_IO_HDR))
+		err = err ? err : -EFAULT;
+	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
+	if (err2)
+		err = err ? err : err2;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err ? err : 0;
+}
+
 static int
 max_sectors_bytes(struct request_queue *q)
 {
@@ -1511,6 +1514,78 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+static void
+sg_unshare_fds(struct sg_fd *ma_sfp, struct sg_fd *sl_sfp)
+{
+	if (ma_sfp) {
+		ma_sfp->share_sfp = NULL;
+		ma_sfp->shr_fd = SG_SHARE_FD_UNUSED;
+	}
+	if (sl_sfp) {
+		sl_sfp->share_sfp = NULL;
+		sl_sfp->shr_fd = SG_SHARE_FD_UNUSED;
+	}
+}
+
+/*
+ * Active when writing 1 to ioctl(SG_SET_GET_EXTENDED(CTL_FLAGS(UNSHARE))),
+ * writing 0 has no effect. Undoes the configuration that has done by
+ * ioctl(SG_SET_GET_EXTENDED(SHARE_FD)).
+ */
+static void
+sg_chk_unshare(struct sg_fd *sfp, bool unshare_val)
+{
+	bool retry;
+	int retry_count = 0;
+	unsigned long iflags;
+	struct sg_fd *ma_sfp;
+	struct sg_fd *sl_sfp;
+	struct sg_fd *o_sfp = sfp->share_sfp;/* other side of existing share */
+
+	if (sfp->shr_fd == SG_SHARE_FD_UNUSED || !o_sfp)
+		return;	/* no share to undo */
+	if (!unshare_val)
+		return;
+again:
+	retry = false;
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (sfp->shr_fd == SG_SHARE_FD_MASTER) {  /* called on master fd */
+		ma_sfp = sfp;
+		sl_sfp = o_sfp;
+		if (!spin_trylock(&sl_sfp->rq_list_lock)) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+				SG_LOG(1, sfp->parentdp,
+				       "%s: cannot get slave lock\n",
+				       __func__);
+			else
+				retry = true;
+			goto fini;
+		}
+		sg_unshare_fds(ma_sfp, sl_sfp);
+		spin_unlock(&sl_sfp->rq_list_lock);
+	} else {			/* called on slave fd */
+		ma_sfp = o_sfp;
+		sl_sfp = sfp;
+		if (!spin_trylock(&ma_sfp->rq_list_lock)) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+				SG_LOG(1, sfp->parentdp,
+				       "%s: cannot get master lock\n",
+				       __func__);
+			else
+				retry = true;
+			goto fini;
+		}
+		sg_unshare_fds(ma_sfp, sl_sfp);
+		spin_unlock(&ma_sfp->rq_list_lock);
+	}
+fini:
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	if (retry) {
+		cpu_relax();
+		goto again;
+	}
+}
+
 /*
  * Returns duration since srp->start_ns (using boot time as an epoch). Unit
  * is nanoseconds when time_in_ns==true; else it is in milliseconds.
@@ -1602,8 +1677,8 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
  * the blocking multiple request case
  */
 static int
-sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
-		  struct sg_io_v4 *h4p, struct sg_request *srp)
+sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
+		  struct sg_request *srp)
 {
 	int res;
 	enum sg_rq_state sr_st;
@@ -1643,7 +1718,7 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
  * Returns 0 on success else a negated errno.
  */
 static int
-sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
+sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp,
 	     void __user *p)
 {
 	int res;
@@ -1653,7 +1728,8 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sdp, "%s:  SG_IO%s\n", __func__,
-	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
+	       ((sfp->filp->f_flags & O_NONBLOCK) ?
+				 " O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
 	if (unlikely(res))
 		return res;
@@ -1667,9 +1743,9 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 				   ((u8 __user *)p) + SZ_SG_IO_HDR,
 				   SZ_SG_IO_V4 - SZ_SG_IO_HDR))
 			return -EFAULT;
-		res = sg_v4_submit(filp, sfp, p, h4p, true, &srp);
+		res = sg_v4_submit(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
-		res = sg_v3_submit(filp, sfp, h3p, true, &srp);
+		res = sg_v3_submit(sfp, h3p, true, &srp);
 	} else {
 		pr_info_once("sg: %s: v3 or v4 interface only here\n",
 			     __func__);
@@ -1679,7 +1755,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_event_srp(filp, sfp, p, h4p, srp);
+	res = sg_wait_event_srp(sfp, p, h4p, srp);
 	if (unlikely(res)) {
 		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s\n", __func__,
 		       "unexpected srp", srp,
@@ -1792,6 +1868,215 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	return res;
 }
 
+static int
+sg_idr_max_id(int id, void *p, void *data)
+{
+	int *k = data;
+
+	if (*k < id)
+		*k = id;
+
+	return 0;
+}
+
+/*
+ * Scans sg driver object tree looking for search_for. Returns valid pointer
+ * if found; returns negated errno twisted by ERR_PTR(); or return NULL if
+ * not found (and no error).
+ */
+static struct sg_fd *
+sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
+		  struct sg_fd *from_sfp, bool from_is_master)
+{
+	bool found = false;
+	int k, num_d;
+	int res = 0;
+	unsigned long iflags;
+	struct sg_fd *sfp;
+	struct sg_device *sdp;
+
+	num_d = -1;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	idr_for_each(&sg_index_idr, sg_idr_max_id, &num_d);
+	++num_d;
+	for (k = 0; k < num_d; ++k) {
+		sdp = idr_find(&sg_index_idr, k);
+		if (!sdp || unlikely(atomic_read(&sdp->detaching)))
+			continue;
+		read_lock(&sdp->sfd_llock);
+		list_for_each_entry(sfp, &sdp->sfds, sfd_entry) {
+			if (test_bit(SG_FFD_RELEASE, sfp->ffd_bm))
+				continue;
+			spin_lock(&sfp->rq_list_lock);
+			if (search_for != sfp->filp) {
+				spin_unlock(&sfp->rq_list_lock);
+				continue;       /* not this one */
+			}
+			if (sfp->shr_fd != SG_SHARE_FD_UNUSED) {
+				res = -EADDRNOTAVAIL;/* already sharing */
+			} else if (SG_RS_ACTIVE(sfp->rsv_srp)) {
+				res = -EBUSY;/* master reserve rq busy */
+			} else if (sdp->exclude) {/* O_EXCL used on this dev */
+				res = -EPERM;
+			} else {
+				res = 0;
+				found = true;
+				if (spin_trylock(&from_sfp->rq_list_lock)) {
+					from_sfp->share_sfp = sfp;
+					from_sfp->shr_fd = from_is_master ?
+						SG_SHARE_FD_MASTER : search_fd;
+					spin_unlock(&from_sfp->rq_list_lock);
+					sfp->share_sfp = from_sfp;
+					sfp->shr_fd = from_is_master ?
+						search_fd : SG_SHARE_FD_MASTER;
+				} else { /* ?? borrow EPROBE_DEFER + encore */
+					res = -EPROBE_DEFER;
+				}
+			}
+			spin_unlock(&sfp->rq_list_lock);
+			if (res || found)
+				break;
+		}       /* end of loop of all fd_s in current device */
+		read_unlock(&sdp->sfd_llock);
+		if (res || found)
+			break;
+	}       /* end of loop of all sg devices */
+	read_unlock_irqrestore(&sg_index_lock, iflags);
+	if (unlikely(res < 0))
+		return ERR_PTR(res);
+	return found ? sfp : NULL;
+}
+
+/*
+ * After checking the proposed master-slave relationship is unique and valid,
+ * sets up pointers between master and slave sg_fd objects. Returns 0 on
+ * success or negated errno value. From ioctl(EXTENDED(SG_SEIM_SHARE_FD)).
+ */
+static int
+sg_fd_share(struct sg_fd *sl_sfp, int m_fd)
+{
+	bool found = false;
+	int res = 0;
+	int retry_count = 0;
+	struct file *fp;
+	struct sg_fd *ma_sfp;
+
+	SG_LOG(3, sl_sfp->parentdp, "%s:  SHARE: master_fd: %d\n", __func__,
+	       m_fd);
+	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+		return -EACCES;
+	if (m_fd < 0)
+		return -EBADF;
+	if (sl_sfp->shr_fd != SG_SHARE_FD_UNUSED)
+		return -EADDRINUSE;  /* don't allow chain of shares */
+	/* Alternate approach: fcheck_files(current->files, m_fd) */
+	fp = fget(m_fd);
+	if (!fp)
+		return -ENOENT;
+	if (sl_sfp->filp == fp) {/* share with self confusing */
+		res = -ELOOP;
+		goto fini;
+	}
+	SG_LOG(6, sl_sfp->parentdp, "%s: master fd okay, scan for filp=0x%p\n",
+	       __func__, fp);
+again:
+	ma_sfp = sg_find_sfp_by_fd(fp, m_fd, sl_sfp, false);
+	if (IS_ERR(ma_sfp)) {
+		res = PTR_ERR(ma_sfp);
+		if (res == -EPROBE_DEFER) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES) {
+				res = -EBUSY;
+			} else {
+				res = 0;
+				cpu_relax();
+				goto again;
+			}
+		}
+	} else {
+		found = !!ma_sfp;
+	}
+fini:
+	/* paired with fp=fget(m_fd) above */
+	fput(fp);
+	if (unlikely(res))
+		return res;
+	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
+}
+
+/*
+ * After checking the proposed master-slave relationship is unique and
+ * valid, sets up pointers between master and slave sg_fd objects. Return
+ * 0 on success or negated errno value.
+ */
+static int
+sg_fd_reshare(struct sg_fd *ma_sfp, int new_sl_fd)
+{
+	bool found = false;
+	int res = 0;
+	int retry_count = 0;
+	unsigned long iflags;
+	struct file *fp;
+	struct sg_fd *sl_sfp = ma_sfp->share_sfp;
+	enum sg_rq_state rq_st;
+
+	SG_LOG(3, sl_sfp->parentdp, "%s:  new_slave_fd: %d\n", __func__,
+	       new_sl_fd);
+	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+		return -EACCES;
+	if (new_sl_fd < 0)
+		return -EBADF;
+	if (ma_sfp->shr_fd != SG_SHARE_FD_MASTER || !sl_sfp)
+		return -EINVAL;
+	if (!ma_sfp->rsv_srp)
+		res = -EPROTO;	/* Internal error */
+	rcu_read_lock();
+	rq_st = atomic_read(&ma_sfp->rsv_srp->rq_st);
+	if (rq_st != SG_RS_INACTIVE)
+		res = -EBUSY;
+	rcu_read_unlock();
+	if (unlikely(res))
+		return res;
+
+	/* Alternate approach: fcheck_files(current->files, m_fd) */
+	fp = fget(new_sl_fd);
+	if (!fp)
+		return -ENOENT;
+	if (ma_sfp->filp == fp) {/* share with self confusing */
+		res = -ELOOP;
+		goto fini;
+	}
+	SG_LOG(6, sl_sfp->parentdp, "%s: slave fd ok, scan for filp=0x%p\n",
+	       __func__, fp);
+	/* cleanup up old slave side */
+	spin_lock_irqsave(&sl_sfp->rq_list_lock, iflags);
+	sl_sfp->share_sfp = NULL;
+	sl_sfp->shr_fd = SG_SHARE_FD_UNUSED;
+	spin_unlock_irqrestore(&sl_sfp->rq_list_lock, iflags);
+
+again:
+	sl_sfp = sg_find_sfp_by_fd(fp, new_sl_fd, ma_sfp, true);
+	if (IS_ERR(sl_sfp)) {
+		res = PTR_ERR(sl_sfp);
+		if (res == -EPROBE_DEFER) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES) {
+				res = -EBUSY;
+			} else {
+				res = 0;
+				cpu_relax();
+				goto again;
+			}
+		}
+	} else {
+		found = !!sl_sfp;
+	}
+fini:
+	/* paired with fp=fget(new_sl_fd) above */
+	fput(fp);
+	if (unlikely(res))
+		return res;
+	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
+}
+
 /*
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
@@ -1918,6 +2203,16 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			*c_flgsp &= ~SG_CTL_FLAGM_Q_TAIL;
 	}
+	/*
+	 * UNSHARE boolean: when reading yields zero. When writing true,
+	 * unshares this fd from a previously established fd share. If
+	 * a shared commands is inflight, waits a little while for it
+	 * to finish.
+	 */
+	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE)
+		sg_chk_unshare(sfp, !!(*c_flgsp & SG_CTL_FLAGM_UNSHARE));
+	if (c_flgs_rm & SG_CTL_FLAGM_UNSHARE)
+		*c_flgsp &= ~SG_CTL_FLAGM_UNSHARE;      /* clear bit */
 	/* NO_DURATION boolean, [rbw] */
 	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
 		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
@@ -2051,6 +2346,32 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
 		sg_extended_read_value(sfp, seip);
+	/* slave side gives fd of master to share with [rbw] */
+	if (or_masks & SG_SEIM_SHARE_FD) {
+		mutex_lock(&sfp->f_mutex);
+		n = sfp->shr_fd;        /* hold prior shr_fd for later read */
+		if (s_wr_mask & SG_SEIM_SHARE_FD) {
+			result = sg_fd_share(sfp, (int)seip->share_fd);
+			if (ret == 0 && result)
+				ret = result;
+		}
+		if (s_rd_mask & SG_SEIM_SHARE_FD)
+			seip->share_fd = (unsigned int)n;
+		mutex_unlock(&sfp->f_mutex);
+	}
+	/* master side is given shr_fd of new slave [rbw] */
+	if (or_masks & SG_SEIM_CHG_SHARE_FD) {
+		mutex_lock(&sfp->f_mutex);
+		n = sfp->shr_fd;        /* hold prior sl_fd for later read */
+		if (s_wr_mask & SG_SEIM_CHG_SHARE_FD) {
+			result = sg_fd_reshare(sfp, (int)seip->share_fd);
+			if (ret == 0 && result)
+				ret = result;
+		}
+		if (s_rd_mask & SG_SEIM_CHG_SHARE_FD)
+			seip->share_fd = (unsigned int)n;
+		mutex_unlock(&sfp->f_mutex);
+	}
 	/* override scatter gather element size [rbw] (def: SG_SCATTER_SZ) */
 	if (or_masks & SG_SEIM_SGAT_ELEM_SZ) {
 		n = sfp->sgat_elem_sz;
@@ -2193,19 +2514,19 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 
 	switch (cmd_in) {
 	case SG_IO:
-		return sg_ctl_sg_io(filp, sdp, sfp, p);
+		return sg_ctl_sg_io(sdp, sfp, p);
 	case SG_IOSUBMIT:
 		SG_LOG(3, sdp, "%s:    SG_IOSUBMIT\n", __func__);
-		return sg_ctl_iosubmit(filp, sfp, p);
+		return sg_ctl_iosubmit(sfp, p);
 	case SG_IOSUBMIT_V3:
 		SG_LOG(3, sdp, "%s:    SG_IOSUBMIT_V3\n", __func__);
-		return sg_ctl_iosubmit_v3(filp, sfp, p);
+		return sg_ctl_iosubmit_v3(sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sdp, "%s:    SG_IORECEIVE\n", __func__);
-		return sg_ctl_ioreceive(filp, sfp, p);
+		return sg_ctl_ioreceive(sfp, p);
 	case SG_IORECEIVE_V3:
 		SG_LOG(3, sdp, "%s:    SG_IORECEIVE_V3\n", __func__);
-		return sg_ctl_ioreceive_v3(filp, sfp, p);
+		return sg_ctl_ioreceive_v3(sfp, p);
 	case SG_IOABORT:
 		SG_LOG(3, sdp, "%s:    SG_IOABORT\n", __func__);
 		if (read_only)
@@ -3205,8 +3526,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;
 	if (cwrp->u_cmdp)
-		res = sg_fetch_cmnd(cwrp->filp, sfp, cwrp->u_cmdp,
-				    cwrp->cmd_len, scsi_rp->cmd);
+		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
+				    scsi_rp->cmd);
 	else if (cwrp->cmdp)
 		memcpy(scsi_rp->cmd, cwrp->cmdp, cwrp->cmd_len);
 	else
@@ -3850,7 +4171,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 
 /* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
 static struct sg_fd *
-sg_add_sfp(struct sg_device *sdp)
+sg_add_sfp(struct sg_device *sdp, struct file *filp)
 {
 	bool reduced = false;
 	int rbuf_len;
@@ -3872,6 +4193,7 @@ sg_add_sfp(struct sg_device *sdp)
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
+	sfp->filp = filp;
 	/* other bits in sfp->ffd_bm[1] cleared by kzalloc() above */
 	assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
 	assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
@@ -4005,17 +4327,6 @@ sg_remove_sfp(struct kref *kref)
 	schedule_work(&sfp->ew_fd.work);
 }
 
-static int
-sg_idr_max_id(int id, void *p, void *data)
-{
-	int *k = data;
-
-	if (*k < id)
-		*k = id;
-
-	return 0;
-}
-
 /* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 5223ba33fb8d..a3fa26644496 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -170,6 +170,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
 #define SG_SEIM_TOT_FD_THRESH	0x8	/* tot_fd_thresh of data buffers */
 #define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
+#define SG_SEIM_SHARE_FD	0x20	/* slave gives fd of master: sharing */
+#define SG_SEIM_CHG_SHARE_FD	0x40	/* master gives fd of new slave */
 #define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
 #define SG_SEIM_ALL_BITS	0xff	/* should be OR of previous items */
 
@@ -179,6 +181,7 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */

From patchwork Fri May 24 18:48:06 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960355
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C67B1390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:53 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 084EA28A21
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id F098C28A22; Fri, 24 May 2019 18:48:52 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65EF728A25
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403810AbfEXSss (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:48 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56437 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391669AbfEXSsr (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:47 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 07062204165;
        Fri, 24 May 2019 20:48:43 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id Nm9gZHUYl-6v; Fri, 24 May 2019 20:48:38 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 5AF27204198;
        Fri, 24 May 2019 20:48:30 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 16/19] sg: add shared requests
Date: Fri, 24 May 2019 14:48:06 -0400
Message-Id: <20190524184809.25121-17-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Add request sharing which is invoked on a shared file
descriptor by using SGV4_FLAG_SHARE. The file share is
asymmetric: the master side is assumed to do data-in
command (e.g. READ) first, followed by the slave side
doing a data-out command (e.g. WRITE). The master side
may also set SG_FLAG_NO_DXFER and the slave side must
set that flag. If both sides set that flag then a
single bio is used and the user space doesn't "see"
the data. If the master side does not set
SG_FLAG_NO_DXFER then the read data is copied to the
user space. And that copy to user space can displaced
by using SG_FLAG_MMAP_IO (but that adds some other
overheads).

See the webpage at: http://sg.danny.cz/sg/sg_v40.html
in the section titled: "7 Request sharing".

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 812 +++++++++++++++++++++++++++++++++++------
 include/uapi/scsi/sg.h |   8 +
 2 files changed, 705 insertions(+), 115 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 44f09c65e0b9..9adca3988c58 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -89,6 +89,17 @@ enum sg_rq_state {
 	SG_RS_AWAIT_RD,		/* response received, awaiting read */
 	SG_RS_DONE_RD,		/* read is ongoing or done */
 	SG_RS_BUSY,		/* temporary state should rarely be seen */
+	SG_RS_SHR_SWAP,		/* swap: master finished, awaiting slave */
+	SG_RS_SHR_SLAVE,	/* master waits while slave inflight */
+};
+
+/* slave sets up sharing: ioctl(sl_fd,SG_SET_GET_EXTENDED(SHARE_FD(ma_fd))) */
+enum sg_shr_var {
+	SG_SHR_NONE = 0,	/* no sharing on owning fd */
+	SG_SHR_MA_FD_NOT_RQ,	/* master sharing on fd but not this req */
+	SG_SHR_MA_RQ,		/* master sharing on this req */
+	SG_SHR_SL_FD_NOT_RQ,	/* slave sharing on fd but not this req */
+	SG_SHR_SL_RQ,		/* slave sharing on this req */
 };
 
 /* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
@@ -130,6 +141,7 @@ enum sg_rq_state {
 #define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+#define SG_FFD_MASTER_ERR	6	/* master side of share had error */
 #define SG_FFD_PREFER_TAG	7	/* prefer tag over pack_id (def) */
 #define SG_FFD_RELEASE		8	/* release (close) underway */
 #define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
@@ -202,6 +214,7 @@ struct sg_request {	/* active SCSI command or inactive on free list (fl) */
 	struct list_head fl_entry;	/* member of rq_fl */
 	spinlock_t req_lck;
 	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
+	struct sg_scatter_hold *sgatp;	/* ptr to prev unless slave shr req */
 	union {
 		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
 		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
@@ -214,6 +227,7 @@ struct sg_request {	/* active SCSI command or inactive on free list (fl) */
 	int pack_id;		/* v3 pack_id or in v4 request_extra field */
 	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
+	enum sg_shr_var sh_var;	/* sharing variety, SG_SHR_NONE=0 if none */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
 	int tag;		/* block layer identifier of request */
 	u64 start_ns;		/* starting point of command duration calc */
@@ -246,6 +260,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
 	struct file *filp;	/* my identity when sharing */
 	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct sg_request *slave_srp;	/* non-NULL when rsv SG_SHR_MA_RQ */
 	struct sg_fd *share_sfp;/* master+slave shares set this, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct kref f_ref;
@@ -304,7 +319,9 @@ static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
 static struct sg_request *sg_add_request(struct sg_comm_wr_t *cwrp,
-					 int dxfr_len);
+					 enum sg_shr_var sh_var, int dxfr_len);
+static int sg_rq_map_kern(struct sg_request *srp, struct request_queue *q,
+			  struct request *rq);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
@@ -312,6 +329,7 @@ static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
 static void sg_calc_sgat_param(struct sg_device *sdp);
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
+static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 static void sg_rep_rq_state_fail(struct sg_device *sdp,
 				 enum sg_rq_state exp_old_st,
 				 enum sg_rq_state want_st,
@@ -327,6 +345,8 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 
 #define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
 #define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == SG_RS_AWAIT_RD)
+#define SG_MA_THIS_RQ(srp) ((srp)->sh_var == SG_SHR_MA_RQ)
+#define SG_IS_SYNC_INVOC(srp) test_bit(SG_FRQ_SYNC_INVOC, (srp)->frq_bm)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -417,8 +437,8 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 				(sdp->open_wait,
-				 unlikely((atomic_read(&sdp->detaching)) ||
-					  !sdp->exclude));
+				 unlikely(atomic_read(&sdp->detaching)) ||
+				 !sdp->exclude);
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (unlikely(res)) /* -ERESTARTSYS */
@@ -469,7 +489,7 @@ sg_open(struct inode *inode, struct file *filp)
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
 	non_block = !!(op_flags & O_NONBLOCK);
-	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
+	if (unlikely(o_excl) && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM;/* not permitted, need write access for O_EXCL */
 	sdp = sg_get_dev(min_dev);
 	if (IS_ERR(sdp))
@@ -595,6 +615,10 @@ sg_release(struct inode *inode, struct file *filp)
  * of the synchronous ioctl(SG_IO) system call.
  */
 
+/*
+ * This is the write(2) system call entry point. v4 interface disallowed.
+ * Returns count or a negated errno value.
+ */
 static ssize_t
 sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
@@ -719,7 +743,7 @@ sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 		return -EMSGSIZE;
 	if (copy_from_user(cdbp, u_cdbp, len))
 		return -EFAULT;
-	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) { /* read-only */
+	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) {	/* read-only */
 		switch (sfp->parentdp->device->type) {
 		case TYPE_DISK:
 		case TYPE_RBC:
@@ -744,7 +768,7 @@ sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		if (!list_empty(&sfp->rq_list))
 			return -EBUSY;  /* already active requests on fd */
-		if (hp->dxfer_len > sfp->rsv_srp->sgat_h.buflen)
+		if (hp->dxfer_len > sfp->rsv_srp->sgatp->buflen)
 			return -ENOMEM; /* MMAP_IO size must fit in reserve */
 		if (hp->flags & SG_FLAG_DIRECT_IO)
 			return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
@@ -789,7 +813,7 @@ sg_v4_submit(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 			len = h4p->dout_xfer_len;
 		if (!list_empty(&sfp->rq_list))
 			return -EBUSY;  /* already active requests on fd */
-		if (len > sfp->rsv_srp->sgat_h.buflen)
+		if (len > sfp->rsv_srp->sgatp->buflen)
 			return -ENOMEM; /* MMAP_IO size must fit in reserve */
 		if (h4p->flags & SG_FLAG_DIRECT_IO)
 			return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
@@ -854,6 +878,53 @@ sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 	return -EPERM;
 }
 
+/*
+ * Assumes sharing has been established at the file descriptor level and now we
+ * check the rq_flags of a new request/command. SGV4_FLAG_NO_DXFER may or may
+ * not be used on the master side, it must be used on the slave side. Also
+ * returns (via *sh_varp) the proposed sg_request::sh_var of the new request
+ * yet to be built/re-used.
+ */
+static int
+sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
+		   enum sg_shr_var *sh_varp)
+{
+	bool is_master = (sfp->shr_fd == SG_SHARE_FD_MASTER);
+	int result = 0;
+	enum sg_shr_var sh_var = SG_SHR_NONE;
+
+	if (rq_flags & SGV4_FLAG_SHARE) {
+		if (rq_flags & SG_FLAG_DIRECT_IO)
+			result = -EINVAL; /* since no control of data buffer */
+		else if (dxfer_len < 1)
+			result = -ENODATA;
+		else if (is_master) {		/* fd is reader */
+			sh_var = SG_SHR_MA_RQ;
+			if (dir != SG_DXFER_FROM_DEV)
+				result = -ENOMSG;
+			if (rq_flags & SGV4_FLAG_NO_DXFER) {
+				/* rule out some contradictions */
+				if (rq_flags & SG_FL_MMAP_DIRECT)
+					result = -ENODATA;
+			}
+		} else {			/* fd is slave, writer */
+			sh_var = SG_SHR_SL_RQ;
+			if (dir != SG_DXFER_TO_DEV)
+				result = -ENOMSG;
+			if (!(rq_flags & SGV4_FLAG_NO_DXFER))
+				result = -ENOMSG;
+			if (rq_flags & SG_FL_MMAP_DIRECT)
+				result = -ENODATA;
+		}
+	} else if (is_master) {
+		sh_var = SG_SHR_MA_FD_NOT_RQ;
+	} else {
+		sh_var = SG_SHR_SL_FD_NOT_RQ;
+	}
+	*sh_varp = sh_var;
+	return result;
+}
+
 static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
@@ -862,7 +933,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	struct sg_device *sdp = sfp->parentdp;
 
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
-	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	sync = SG_IS_SYNC_INVOC(srp);
 	SG_LOG(3, sdp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
 	if (test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm))
 		srp->start_ns = 0;
@@ -927,6 +998,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	int dxfr_len, dir;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u32 rq_flags;
+	enum sg_shr_var sh_var;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_device *sdp = fp->parentdp;
 	struct sg_request *srp;
@@ -957,10 +1029,20 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
+	if (fp->shr_fd == SG_SHARE_FD_UNUSED) {
+		/* no sharing established on this fd */
+		sh_var = SG_SHR_NONE;
+		if (rq_flags & SGV4_FLAG_SHARE)
+			return ERR_PTR(-ENOMSG);
+	} else {
+		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
+		if (unlikely(res < 0))
+			return ERR_PTR(res);
+	}
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
-	srp = sg_add_request(cwrp, dxfr_len);
+	srp = sg_add_request(cwrp, sh_var, dxfr_len);
 	if (IS_ERR(srp))
 		return srp;
 	srp->rq_flags = rq_flags;
@@ -1099,17 +1181,92 @@ sg_rec_v3v4_state(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	int err = 0;
 	int sb_len_wr;
 	u32 rq_res = srp->rq_result;
+	enum sg_shr_var sh_var = srp->sh_var;
 
 	sb_len_wr = sg_copy_sense(srp, v4_active);
 	if (sb_len_wr < 0)
 		return sb_len_wr;
 	if (rq_res & SG_ML_RESULT_MSK)
 		srp->rq_info |= SG_INFO_CHECK;
+	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
+		srp->rq_info |= SG_INFO_ABORTED;
+
+	if (sh_var == SG_SHR_SL_RQ && sfp->share_sfp) {
+		struct sg_request *ma_srp = sfp->share_sfp->rsv_srp;
+		enum sg_rq_state mar_st = atomic_read(&ma_srp->rq_st);
+
+		switch (mar_st) {
+		case SG_RS_SHR_SWAP:
+		case SG_RS_SHR_SLAVE:
+			/* already on master's free list, make re-usable */
+			ma_srp->tag = SG_TAG_WILDCARD;
+			ma_srp->sh_var = SG_SHR_NONE;
+			err = sg_rstate_chg(ma_srp, mar_st, SG_RS_INACTIVE);
+			break;
+		case SG_RS_INACTIVE:
+		case SG_RS_AWAIT_RD:
+			sfp->share_sfp->slave_srp = NULL;
+			break;	/* nothing to do */
+		default:
+			err = -EPROTO;	/* Logic error */
+			SG_LOG(1, sfp->parentdp,
+			       "%s: SHR_SL_ACTIVE, bad master state: %s\n",
+			       __func__, sg_rq_st_str(mar_st, true));
+			break;	/* nothing to do */
+		}
+	}
 	if (unlikely(atomic_read(&sfp->parentdp->detaching)))
 		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
 	return err;
 }
 
+static int
+sg_v3v4_complete(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
+{
+	int err = 0;
+	enum sg_rq_state sr_st = atomic_read(&srp->rq_st);
+
+	/* advance state machine, send signal to slave if appropriate */
+	switch (srp->sh_var) {
+	case SG_SHR_MA_RQ:
+		{
+			int poll_type = POLL_OUT;
+
+			if ((srp->rq_result & SG_ML_RESULT_MSK) || other_err) {
+				set_bit(SG_FFD_MASTER_ERR, sfp->ffd_bm);
+				err = sg_rstate_chg(srp, sr_st, SG_RS_DONE_RD);
+				poll_type = POLL_HUP;	/* "Hang-UP flag */
+			} else if (sr_st != SG_RS_SHR_SWAP) {
+				err = sg_rstate_chg(srp, sr_st,
+						    SG_RS_SHR_SWAP);
+			}
+			if (sfp->share_sfp)
+				kill_fasync(&sfp->share_sfp->async_qp,
+					    SIGPOLL, poll_type);
+		}
+		break;
+	case SG_SHR_SL_RQ:	/* cleanup both on slave completion */
+		{
+			struct sg_fd *ma_sfp = sfp->share_sfp;
+
+			if (ma_sfp) {
+				ma_sfp->slave_srp = NULL;
+				if (ma_sfp->rsv_srp)
+					ma_sfp->rsv_srp->sh_var =
+							 SG_SHR_MA_FD_NOT_RQ;
+			}
+		}
+		srp->sh_var = SG_SHR_SL_FD_NOT_RQ;
+		srp->sgatp = &srp->sgat_h;
+		err = sg_rstate_chg(srp, sr_st, SG_RS_DONE_RD);
+		break;
+	default:
+		err = sg_rstate_chg(srp, sr_st, SG_RS_DONE_RD);
+		break;
+	}
+	return err;
+}
+
 static int
 sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	      struct sg_io_v4 *h4p)
@@ -1132,10 +1289,10 @@ sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	h4p->duration = srp->duration;
 	switch (srp->s_hdr4.dir) {
 	case SG_DXFER_FROM_DEV:
-		h4p->din_xfer_len = srp->sgat_h.dlen;
+		h4p->din_xfer_len = srp->sgatp->dlen;
 		break;
 	case SG_DXFER_TO_DEV:
-		h4p->dout_xfer_len = srp->sgat_h.dlen;
+		h4p->dout_xfer_len = srp->sgatp->dlen;
 		break;
 	default:
 		break;
@@ -1149,7 +1306,7 @@ sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
 			err = err ? err : -EFAULT;
 	}
-	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
+	err2 = sg_v3v4_complete(sfp, srp, err < 0);
 	if (err2)
 		err = err ? err : err2;
 	sg_finish_scsi_blk_rq(srp);
@@ -1199,9 +1356,9 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
-		res = wait_event_interruptible(sfp->read_wait,
-					       sg_get_ready_srp(sfp, &srp,
-								id, use_tag));
+		res = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, id, use_tag));
 		if (unlikely(atomic_read(&sdp->detaching)))
 			return -ENODEV;
 		if (unlikely(res))
@@ -1401,6 +1558,19 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 				ret = get_user(want_id, &h3_up->pack_id);
 				if (ret)
 					return ret;
+				if (!non_block) {
+					int flgs;
+
+					ret = get_user(flgs, &h3_up->flags);
+					if (ret)
+						return ret;
+					if (flgs & SGV4_FLAG_IMMED)
+						non_block = true;
+				}
+			} else if (v3_hdr->interface_id == 'Q') {
+				pr_info_once("sg: %s: v4 interface%s here\n",
+					     __func__, " disallowed");
+				return -EPERM;
 			} else {
 				return -EPERM;
 			}
@@ -1448,7 +1618,8 @@ sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	struct sg_io_hdr hdr3;
 	struct sg_io_hdr *hp = &hdr3;
 
-	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
+	SG_LOG(3, sfp->parentdp, "%s: sh_var: %s srp=0x%p\n", __func__,
+	       sg_shr_str(srp->sh_var, false), srp);
 	err = sg_rec_v3v4_state(sfp, srp, false);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
@@ -1464,7 +1635,7 @@ sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	/* copy_*_user() [without leading underscores] checks access */
 	if (copy_to_user(p, hp, SZ_SG_IO_HDR))
 		err = err ? err : -EFAULT;
-	err2 = sg_rstate_chg(srp, atomic_read(&srp->rq_st), SG_RS_DONE_RD);
+	err2 = sg_v3v4_complete(sfp, srp, err < 0);
 	if (err2)
 		err = err ? err : err2;
 	sg_finish_scsi_blk_rq(srp);
@@ -1514,11 +1685,77 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+/*
+ * Only valid for shared file descriptors, else -EINVAL. Should only be
+ * called after a master side request has successfully completed so that
+ * there is valid data in reserve buffer. If fini1_again0 is true then
+ * master is taken out of the state waiting for a slave request and the
+ * master is put in the inactive state. If fini1_again0 is false (0) then
+ * the master (assuming it is inactive) is put in a state waiting for
+ * a slave request. This function is called when the write mask is set on
+ * ioctl(SG_SET_GET_EXTENDED(SG_CTL_FLAGM_MASTER_FINI)).
+ */
+static int
+sg_change_after_master_rq(struct sg_fd *sfp, bool fini1_again0)
+{
+	int res = 0;
+	enum sg_rq_state sr_st;
+	struct sg_request *ma_rsv_srp = NULL;
+
+	rcu_read_lock();
+	if (sfp->shr_fd == SG_SHARE_FD_MASTER) {	/* from master */
+		ma_rsv_srp = sfp->rsv_srp;
+	} else if (sfp->shr_fd == SG_SHARE_FD_UNUSED) {
+		res = -EINVAL;
+	} else {					/* from slave */
+		if (sfp->share_sfp)
+			ma_rsv_srp = sfp->share_sfp->rsv_srp;
+		else
+			res = -EINVAL;
+	}
+	if (res == 0 && ma_rsv_srp) {
+		spin_lock(&ma_rsv_srp->req_lck);
+		sr_st = atomic_read(&ma_rsv_srp->rq_st);
+		if (fini1_again0) {
+			switch (sr_st) {
+			case SG_RS_SHR_SWAP:
+				ma_rsv_srp->sh_var = SG_SHR_MA_FD_NOT_RQ;
+				ma_rsv_srp = NULL;
+				res = sg_rstate_chg(ma_rsv_srp, sr_st,
+						    SG_RS_INACTIVE);
+				break;
+			case SG_RS_SHR_SLAVE:	/* too late, slave rq active */
+			case SG_RS_BUSY:
+				res = -EAGAIN;
+				break;
+			default:	/* master in SG_RS_SHR_SWAIT is bad */
+				res = -EINVAL;
+				break;
+			}
+		} else if (sr_st == SG_RS_INACTIVE) {
+			ma_rsv_srp->sh_var = SG_SHR_MA_RQ;
+			res = sg_rstate_chg(ma_rsv_srp, sr_st, SG_RS_SHR_SWAP);
+		} else {
+			res = -EBUSY;	/* master busy doing something else */
+		}
+		spin_unlock(&ma_rsv_srp->req_lck);
+	}
+	rcu_read_unlock();
+	if (unlikely(res)) {
+		SG_LOG(1, sfp->parentdp, "%s: err=%d\n", __func__, -res);
+	} else {
+		SG_LOG(6, sfp->parentdp, "%s: okay, fini1_again0=%d\n",
+		       __func__, fini1_again0);
+	}
+	return res;
+}
+
 static void
 sg_unshare_fds(struct sg_fd *ma_sfp, struct sg_fd *sl_sfp)
 {
 	if (ma_sfp) {
 		ma_sfp->share_sfp = NULL;
+		ma_sfp->slave_srp = NULL;
 		ma_sfp->shr_fd = SG_SHARE_FD_UNUSED;
 	}
 	if (sl_sfp) {
@@ -1527,6 +1764,64 @@ sg_unshare_fds(struct sg_fd *ma_sfp, struct sg_fd *sl_sfp)
 	}
 }
 
+/* Placed here with other share/unshare processing functions */
+static void
+sg_remove_sfp_share(struct sg_fd *sfp)
+{
+	bool o_sfp_is_master = false;
+	int res = 0;
+	enum sg_rq_state sr_st;
+	struct sg_fd *o_sfp = sfp->share_sfp;
+	struct sg_request *rsv_srp = NULL;
+
+	SG_LOG(3, sfp->parentdp, "%s: sfp=0x%p, o_sfp=0x%p%s\n", __func__,
+	       sfp, o_sfp,
+	       ((sfp->shr_fd == SG_SHARE_FD_MASTER) ? " master" : ""));
+	if (!o_sfp)
+		return;
+	if (sfp->shr_fd == SG_SHARE_FD_MASTER) {	/* close on master */
+		rsv_srp = sfp->rsv_srp;
+	} else if (sfp->shr_fd == SG_SHARE_FD_UNUSED) {
+		return;
+	} else {					/* close on slave */
+		o_sfp_is_master = true;
+		rsv_srp = o_sfp->rsv_srp;
+	}
+	spin_lock(&o_sfp->rq_list_lock);
+	sg_unshare_fds(o_sfp, NULL);
+	spin_unlock(&o_sfp->rq_list_lock);
+	if (!rsv_srp) {
+		res = -EPROTO;
+		goto fini;
+	}
+	spin_lock(&rsv_srp->req_lck);
+	if (o_sfp_is_master) {
+		if (rsv_srp->sh_var == SG_SHR_MA_RQ) {
+			sr_st = atomic_read(&rsv_srp->rq_st);
+			switch (sr_st) {
+			case SG_RS_SHR_SLAVE:
+			case SG_RS_SHR_SWAP:
+				res = sg_rstate_chg(rsv_srp, sr_st,
+						    SG_RS_INACTIVE);
+				break;
+			case SG_RS_BUSY:
+				res = -EBUSY;
+			default:
+				break;
+			}
+			rsv_srp->sh_var = SG_SHR_NONE;
+		}
+	} else {
+		rsv_srp->sh_var = SG_SHR_NONE;
+	}
+	spin_unlock(&rsv_srp->req_lck);
+fini:
+	if (unlikely(res)) {
+		SG_LOG(1, sfp->parentdp, "%s: internal err=%d\n", __func__,
+		       -res);
+	}
+}
+
 /*
  * Active when writing 1 to ioctl(SG_SET_GET_EXTENDED(CTL_FLAGS(UNSHARE))),
  * writing 0 has no effect. Undoes the configuration that has done by
@@ -1538,6 +1833,7 @@ sg_chk_unshare(struct sg_fd *sfp, bool unshare_val)
 	bool retry;
 	int retry_count = 0;
 	unsigned long iflags;
+	struct sg_request *ma_rsv_srp;
 	struct sg_fd *ma_sfp;
 	struct sg_fd *sl_sfp;
 	struct sg_fd *o_sfp = sfp->share_sfp;/* other side of existing share */
@@ -1552,17 +1848,20 @@ sg_chk_unshare(struct sg_fd *sfp, bool unshare_val)
 	if (sfp->shr_fd == SG_SHARE_FD_MASTER) {  /* called on master fd */
 		ma_sfp = sfp;
 		sl_sfp = o_sfp;
-		if (!spin_trylock(&sl_sfp->rq_list_lock)) {
-			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
-				SG_LOG(1, sfp->parentdp,
-				       "%s: cannot get slave lock\n",
-				       __func__);
-			else
-				retry = true;
-			goto fini;
+		ma_rsv_srp = ma_sfp->rsv_srp;
+		if (ma_rsv_srp && !SG_MA_THIS_RQ(ma_rsv_srp)) {
+			if (!spin_trylock(&sl_sfp->rq_list_lock)) {
+				if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+					SG_LOG(1, sfp->parentdp,
+					       "%s: cannot get slave lock\n",
+					       __func__);
+				else
+					retry = true;
+				goto fini;
+			}
+			sg_unshare_fds(ma_sfp, sl_sfp);
+			spin_unlock(&sl_sfp->rq_list_lock);
 		}
-		sg_unshare_fds(ma_sfp, sl_sfp);
-		spin_unlock(&sl_sfp->rq_list_lock);
 	} else {			/* called on slave fd */
 		ma_sfp = o_sfp;
 		sl_sfp = sfp;
@@ -1575,7 +1874,9 @@ sg_chk_unshare(struct sg_fd *sfp, bool unshare_val)
 				retry = true;
 			goto fini;
 		}
-		sg_unshare_fds(ma_sfp, sl_sfp);
+		ma_rsv_srp = ma_sfp->rsv_srp;
+		if (!SG_MA_THIS_RQ(ma_rsv_srp))
+			sg_unshare_fds(ma_sfp, sl_sfp);
 		spin_unlock(&ma_sfp->rq_list_lock);
 	}
 fini:
@@ -1633,6 +1934,8 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 		break;
 	case SG_RS_AWAIT_RD:
 	case SG_RS_DONE_RD:
+	case SG_RS_SHR_SWAP:
+	case SG_RS_SHR_SLAVE:
 		res = srp->duration;
 		is_dur = true;	/* completion has occurred, timing finished */
 		break;
@@ -1656,7 +1959,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
-	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	rip->sg_io_owned = SG_IS_SYNC_INVOC(srp);
 	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
 	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
 				srp->tag : srp->pack_id;
@@ -1718,18 +2021,17 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
  * Returns 0 on success else a negated errno.
  */
 static int
-sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp,
-	     void __user *p)
+sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	struct sg_request *srp = NULL;
-	u8 hu8arr[SZ_SG_IO_V4];
+	u8 hu8arr[SZ_SG_IO_V4];		/* v4 header larger than v3 */
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sdp, "%s:  SG_IO%s\n", __func__,
 	       ((sfp->filp->f_flags & O_NONBLOCK) ?
-				 " O_NONBLOCK ignored" : ""));
+				" O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
 	if (unlikely(res))
 		return res;
@@ -1757,14 +2059,18 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	res = sg_wait_event_srp(sfp, p, h4p, srp);
 	if (unlikely(res)) {
-		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s\n", __func__,
-		       "unexpected srp", srp,
-		       sg_rq_st_str(atomic_read(&srp->rq_st), false));
+		SG_LOG(1, sdp, "%s: %s=0x%p  state: %s, share: %s\n",
+		       __func__, "unexpected srp", srp,
+		       sg_rq_st_str(atomic_read(&srp->rq_st), false),
+		       sg_shr_str(srp->sh_var, false));
 	}
 	return res;
 }
 
-/* When use_tag is true then id is a tag, else it is a pack_id. */
+/*
+ * When use_tag is true then id is a tag, else it is a pack_id. Returns
+ * valid srp if match, else returns NULL.
+ */
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 {
@@ -1819,7 +2125,7 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	if (srp) {
 		spin_lock_irqsave(&srp->req_lck, iflags);
 		spin_unlock(&sfp->rq_list_lock);
-	} else {	/* assume device (not just fd) scope */
+	} else if (h4p->flags & SGV4_FLAG_DEV_SCOPE) {
 		spin_unlock(&sfp->rq_list_lock);
 		read_lock(&sdp->sfd_llock);
 		list_for_each_entry(o_sfp, &sdp->sfds, sfd_entry) {
@@ -1836,16 +2142,18 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 			spin_unlock(&o_sfp->rq_list_lock);
 		}
 		read_unlock(&sdp->sfd_llock);
-	}
-	if (!srp)
+		if (!srp)
+			return -ENODATA;
+	} else {
+		spin_unlock(&sfp->rq_list_lock);
 		return -ENODATA;
-
+	}
 	set_bit(SG_FRQ_ABORTING, srp->frq_bm);
 	res = 0;
 	switch (atomic_read(&srp->rq_st)) {
 	case SG_RS_BUSY:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		res = -EBUSY;	/* shouldn't occur often */
+		res = -EBUSY;	/* should not occur often */
 		break;
 	case SG_RS_INACTIVE:	/* inactive on rq_list not good */
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
@@ -1853,6 +2161,8 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		break;
 	case SG_RS_AWAIT_RD:	/* user should still do completion */
 	case SG_RS_DONE_RD:
+	case SG_RS_SHR_SWAP:
+	case SG_RS_SHR_SLAVE:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		break;		/* nothing to do here, return 0 */
 	case SG_RS_INFLIGHT:	/* only attempt abort if inflight */
@@ -1870,12 +2180,12 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 
 static int
 sg_idr_max_id(int id, void *p, void *data)
+	__must_hold(&sg_index_lock)
 {
 	int *k = data;
 
 	if (*k < id)
 		*k = id;
-
 	return 0;
 }
 
@@ -2031,8 +2341,10 @@ sg_fd_reshare(struct sg_fd *ma_sfp, int new_sl_fd)
 		res = -EPROTO;	/* Internal error */
 	rcu_read_lock();
 	rq_st = atomic_read(&ma_sfp->rsv_srp->rq_st);
-	if (rq_st != SG_RS_INACTIVE)
-		res = -EBUSY;
+	if (!(rq_st == SG_RS_INACTIVE || rq_st == SG_RS_SHR_SWAP))
+		res = -EBUSY;		/* master reserve buffer busy */
+	if (ma_sfp->slave_srp)
+		res = -EBUSY;	/* previous slave request not finished */
 	rcu_read_unlock();
 	if (unlikely(res))
 		return res;
@@ -2098,6 +2410,8 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	struct sg_request *t_srp;       /* other fl entries */
 	struct sg_device *sdp = sfp->parentdp;
 
+	if (sfp->shr_fd != SG_SHARE_FD_UNUSED)
+		return -EBUSY;	/* this fd can't be either side of share */
 	rcu_read_lock();
 	o_srp = sfp->rsv_srp;
 	if (!o_srp) {
@@ -2106,7 +2420,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	}
 	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
 	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
-	blen = o_srp->sgat_h.buflen;
+	blen = o_srp->sgatp->buflen;
 	rcu_read_unlock();
 	SG_LOG(3, sdp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
 	       __func__, blen, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
@@ -2129,7 +2443,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	}
 	use_new_srp = true;
 	list_for_each_entry(t_srp, &sfp->rq_fl, fl_entry) {
-		if (t_srp != o_srp && new_sz <= t_srp->sgat_h.buflen &&
+		if (t_srp != o_srp && new_sz <= t_srp->sgatp->buflen &&
 		    !SG_RS_ACTIVE(t_srp)) {
 			/* good candidate on free list, use */
 			use_new_srp = false;
@@ -2152,6 +2466,25 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	return res;
 }
 
+static bool
+sg_any_persistent_orphans(struct sg_fd *sfp)
+{
+	bool res = false;
+	struct sg_request *srp;
+
+	if (!test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm))
+		return false;
+	rcu_read_lock();
+	list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+		if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm)) {
+			res = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return res;
+}
+
 /*
  * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
  * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
@@ -2164,6 +2497,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	u32 c_flgs_wm = seip->ctl_flags_wr_mask;
 	u32 c_flgs_rm = seip->ctl_flags_rd_mask;
 	u32 *c_flgsp = &seip->ctl_flags;
+	struct sg_fd *ma_sfp = sfp->share_sfp;
 	struct sg_device *sdp = sfp->parentdp;
 
 	/* TIME_IN_NS boolean, [raw] time in nanoseconds (def: millisecs) */
@@ -2186,6 +2520,13 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			*c_flgsp &= ~SG_CTL_FLAGM_TAG_FOR_PACK_ID;
 	}
+	/* ORPHANS boolean, [ro] does this fd have any orphan requests? */
+	if (c_flgs_rm & SG_CTL_FLAGM_ORPHANS) {
+		if (sg_any_persistent_orphans(sfp))
+			*c_flgsp |= SG_CTL_FLAGM_ORPHANS;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_ORPHANS;
+	}
 	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
 	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
 		if (atomic_read(&sdp->open_cnt) > 1)
@@ -2212,7 +2553,50 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE)
 		sg_chk_unshare(sfp, !!(*c_flgsp & SG_CTL_FLAGM_UNSHARE));
 	if (c_flgs_rm & SG_CTL_FLAGM_UNSHARE)
-		*c_flgsp &= ~SG_CTL_FLAGM_UNSHARE;      /* clear bit */
+		*c_flgsp &= ~SG_CTL_FLAGM_UNSHARE;	/* clear bit */
+	/* IS_SHARE boolean: [ro] true if fd may be master or slave share */
+	if (c_flgs_rm & SG_CTL_FLAGM_IS_SHARE) {
+		if (sfp->shr_fd == SG_SHARE_FD_UNUSED)
+			*c_flgsp &= ~SG_CTL_FLAGM_IS_SHARE;
+		else
+			*c_flgsp |= SG_CTL_FLAGM_IS_SHARE;
+	}
+	/* IS_MASTER boolean: [ro] true if this fd may be a master share */
+	if (c_flgs_rm & SG_CTL_FLAGM_IS_MASTER) {
+		if (sfp->shr_fd == SG_SHARE_FD_MASTER)
+			*c_flgsp |= SG_CTL_FLAGM_IS_MASTER;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_IS_MASTER;
+	}
+	/*
+	 * MASTER_FINI boolean, [rbw] should be called by slave; when
+	 * reading: master is finished, awaiting action by slave;
+	 * when written: 1 --> slave doesn't want to continue
+	 */
+	if (c_flgs_rm & SG_CTL_FLAGM_MASTER_FINI) {
+		if (ma_sfp && ma_sfp->rsv_srp) {
+			struct sg_request *res_srp = ma_sfp->rsv_srp;
+
+			if (atomic_read(&res_srp->rq_st) == SG_RS_SHR_SWAP)
+				*c_flgsp |= SG_CTL_FLAGM_MASTER_FINI;
+			else
+				*c_flgsp &= ~SG_CTL_FLAGM_MASTER_FINI;
+		} else {
+			*c_flgsp &= ~SG_CTL_FLAGM_MASTER_FINI;
+		}
+	}
+	if (c_flgs_wm & SG_CTL_FLAGM_MASTER_FINI) {
+		bool ma_fini_wm = !!(*c_flgsp & SG_CTL_FLAGM_MASTER_FINI);
+
+		sg_change_after_master_rq(sfp, ma_fini_wm);
+	}
+	/* MASTER_ERR boolean, [ro] share: master finished with error */
+	if (c_flgs_rm & SG_CTL_FLAGM_MASTER_ERR) {
+		if (ma_sfp && test_bit(SG_FFD_MASTER_ERR, ma_sfp->ffd_bm))
+			*c_flgsp |= SG_CTL_FLAGM_MASTER_ERR;
+		else
+			*c_flgsp &= ~SG_CTL_FLAGM_MASTER_ERR;
+	}
 	/* NO_DURATION boolean, [rbw] */
 	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
 		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
@@ -2399,7 +2783,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if (s_rd_mask & SG_SEIM_RESERVED_SIZE)
 		seip->reserved_sz = (u32)min_t(int,
-					       sfp->rsv_srp->sgat_h.buflen,
+					       sfp->rsv_srp->sgatp->buflen,
 					       sdp->max_sgat_sz);
 	/* copy to user space if int or boolean read mask non-zero */
 	if (s_rd_mask || seip->ctl_flags_rd_mask) {
@@ -2553,8 +2937,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
 			list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
 				if (SG_RS_AWAIT_READ(srp) &&
-				    !test_bit(SG_FRQ_SYNC_INVOC,
-					      srp->frq_bm)) {
+				    !SG_IS_SYNC_INVOC(srp)) {
 					val = srp->tag;
 					break;
 				}
@@ -2562,8 +2945,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		} else {
 			list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
 				if (SG_RS_AWAIT_READ(srp) &&
-				    !test_bit(SG_FRQ_SYNC_INVOC,
-					      srp->frq_bm)) {
+				    !SG_IS_SYNC_INVOC(srp)) {
 					val = srp->pack_id;
 					break;
 				}
@@ -2597,7 +2979,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 		return res;
 	case SG_GET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
-		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
+		val = min_t(int, sfp->rsv_srp->sgatp->buflen,
 			    sdp->max_sgat_sz);
 		SG_LOG(3, sdp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
 		       __func__, val);
@@ -2774,11 +3156,10 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	if (sdev->host->hostt->compat_ioctl) {
 		int ret;
 
-		ret = sdev->host->hostt->compat_ioctl(sdev, cmd_in, (void __user *)arg);
-
+		ret = sdev->host->hostt->compat_ioctl(sdev, cmd_in,
+						      uptr64(arg));
 		return ret;
 	}
-	
 	return -ENOIOCTLCMD;
 }
 #endif
@@ -2860,7 +3241,7 @@ sg_vma_fault(struct vm_fault *vmf)
 		goto out_err;
 	}
 	spin_lock_irqsave(&srp->req_lck, iflags);
-	rsv_schp = &srp->sgat_h;
+	rsv_schp = srp->sgatp;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (unlikely(offset >= (unsigned int)rsv_schp->buflen)) {
 		SG_LOG(1, sdp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
@@ -2928,7 +3309,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		ret = -EBUSY;
 		goto out;
 	}
-	rsv_schp = &srp->sgat_h;
+	rsv_schp = srp->sgatp;
 	if (unlikely(req_sz > (unsigned long)rsv_schp->buflen)) {
 		ret = -ENOMEM;
 		goto out;
@@ -3072,9 +3453,11 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 			srp->in_resid = a_resid;
 		}
 	}
+	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm) && srp->rq_result == 0)
+		srp->rq_result |= (DRIVER_HARD << 24);
 
-	SG_LOG(6, sdp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
-	       srp->rq_result);
+	SG_LOG(6, sdp, "%s: pack_id=%d, tag=%d, res=0x%x\n", __func__,
+	       srp->pack_id, srp->tag, srp->rq_result);
 	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
 		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
 							     sfp->ffd_bm));
@@ -3083,7 +3466,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	if (slen > 0) {
 		if (scsi_rp->sense) {
 			srp->sense_bp = kzalloc(SCSI_SENSE_BUFFERSIZE,
-						GFP_ATOMIC);
+						GFP_ATOMIC /* <-- leave */);
 			if (srp->sense_bp)
 				memcpy(srp->sense_bp, scsi_rp->sense, slen);
 		} else {
@@ -3102,7 +3485,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		}
 		spin_unlock(&srp->req_lck);
 	}
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+	if (!SG_IS_SYNC_INVOC(srp))
 		atomic_inc(&sfp->waiting);
 	if (unlikely(sg_rstate_chg(srp, SG_RS_INFLIGHT, rqq_state)))
 		pr_warn("%s: can't set rq_st\n", __func__);
@@ -3250,7 +3633,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		goto cdev_add_err;
 
 	sdp->cdev = cdev;
-	if (sg_sysfs_valid) {
+	if (likely(sg_sysfs_valid)) {
 		struct device *sg_class_member;
 
 		sg_class_member = device_create(sg_sysfs_class, cl_dev->parent,
@@ -3264,7 +3647,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		}
 		error = sysfs_create_link(&scsidp->sdev_gendev.kobj,
 					  &sg_class_member->kobj, "generic");
-		if (error)
+		if (unlikely(error))
 			pr_err("%s: unable to make symlink 'generic' back "
 			       "to sg%d\n", __func__, sdp->index);
 	} else
@@ -3274,7 +3657,6 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		    "type %d\n", sdp->index, scsidp->type);
 
 	dev_set_drvdata(cl_dev, sdp);
-
 	return 0;
 
 cdev_add_err:
@@ -3294,16 +3676,16 @@ static void
 sg_device_destroy(struct kref *kref)
 {
 	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
-	unsigned long flags;
+	unsigned long iflags;
 
 	/* CAUTION!  Note that the device can still be found via idr_find()
 	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
 	 * any other cleanup.
 	 */
 
-	write_lock_irqsave(&sg_index_lock, flags);
+	write_lock_irqsave(&sg_index_lock, iflags);
 	idr_remove(&sg_index_idr, sdp->index);
-	write_unlock_irqrestore(&sg_index_lock, flags);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 
 	SG_LOG(3, sdp, "%s: sdp=0x%p --\n", __func__, sdp);
 
@@ -3356,9 +3738,10 @@ MODULE_LICENSE("GPL");
 MODULE_VERSION(SG_VERSION_STR);
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_GENERIC_MAJOR);
 
-MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element "
-                "size (default: max(SG_SCATTER_SZ, PAGE_SIZE))");
-MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
+MODULE_PARM_DESC
+	(scatter_elem_sz,
+	 "scatter gather element size (def: max(SG_SCATTER_SZ, PAGE_SIZE))");
+MODULE_PARM_DESC(def_reserved_size, "size of request reserved for each fd");
 MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))");
 
 static int __init
@@ -3424,6 +3807,33 @@ exit_sg(void)
 	idr_destroy(&sg_index_idr);
 }
 
+/*
+ * Setup to move data between kernel buffers managed by this driver and a SCSI
+ * device. Note that there is no corresponding 'unmap' call as is required by
+ * blk_rq_map_user() . blk_rq has already been told direction of transfer.
+ */
+static int
+sg_rq_map_kern(struct sg_request *srp, struct request_queue *q,
+	       struct request *rq)
+{
+	int k, pg_sz, dlen, ln;
+	int res = 0;
+	struct sg_scatter_hold *schp = srp->sgatp;
+
+	dlen = schp->dlen;
+	pg_sz = 1 << (PAGE_SHIFT + schp->page_order);
+	SG_LOG(4, srp->parentfp->parentdp, "%s: dlen=%d, pg_sz=%d\n",
+	       __func__, dlen, pg_sz);
+	for (k = 0; k < schp->num_sgat && dlen > 0; ++k, dlen -= ln) {
+		ln = min_t(int, dlen, pg_sz);
+		res = blk_rq_map_kern(q, rq, page_address(schp->pages[k]),
+				      ln, GFP_ATOMIC);
+		if (unlikely(res))
+			break;
+	}
+	return res;
+}
+
 static inline bool
 sg_chk_dio_allowed(struct sg_device *sdp, struct sg_fd *sfp,
 		   struct sg_request *srp, int iov_count, int dir)
@@ -3543,7 +3953,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	reserved = (sfp->rsv_srp == srp);
 	rq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
-	req_schp = &srp->sgat_h;
+	req_schp = srp->sgatp;
 
 	if (dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE) {
 		SG_LOG(4, sdp, "%s: no data xfer [0x%p]\n", __func__, srp);
@@ -3602,6 +4012,9 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
 			SG_LOG(1, sdp, "%s: blk_rq_map_user() res=%d\n",
 			       __func__, res);
+	} else {
+		/* transfer data to/from kernel buffers */
+		res = sg_rq_map_kern(srp, q, rq);
 	}
 fini:
 	if (likely(res == 0)) {
@@ -3617,8 +4030,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			blk_put_request(rq);
 		}
 	}
-	SG_LOG((res ? 1 : 4), sdp, "%s: %s res=%d [0x%p]\n", __func__, cp,
-	       res, srp);
+	SG_LOG((res ? 1 : 4), sdp, "%s: %s %s res=%d [0x%p]\n", __func__,
+	       sg_shr_str(srp->sh_var, false), cp, res, srp);
 	return res;
 }
 
@@ -3637,7 +4050,7 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 
 	SG_LOG(4, sfp->parentdp, "%s: srp=0x%p%s\n", __func__, srp,
 	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+	if (!SG_IS_SYNC_INVOC(srp)) {
 		atomic_dec(&sfp->submitted);
 		atomic_dec(&sfp->waiting);
 	}
@@ -3681,7 +4094,7 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	gfp_t mask_ap = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
 	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
-	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct sg_scatter_hold *schp = srp->sgatp;
 
 	if (unlikely(m_size < 0))
 		return -EFAULT;
@@ -3789,7 +4202,7 @@ sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
 	int k, num, res;
 	struct page *pgp;
-	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct sg_scatter_hold *schp = srp->sgatp;
 
 	SG_LOG(4, srp->parentfp->parentdp, "%s: num_xfer=%d\n", __func__,
 	       num_xfer);
@@ -3842,7 +4255,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	rcu_read_lock();
 	if (unlikely(search_for_1)) {
 		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
-			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+			if (SG_IS_SYNC_INVOC(srp))
 				continue;
 			if (is_tag) {
 				if (srp->tag != id)
@@ -3863,6 +4276,8 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 					bad_sr_st = atomic_read(&srp->rq_st);
 				}
 				break;
+			case SG_RS_SHR_SLAVE:
+				goto good;
 			case SG_RS_INFLIGHT:
 				break;
 			default:
@@ -3876,7 +4291,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		}
 	} else {        /* search for any request is more likely */
 		list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
-			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+			if (SG_IS_SYNC_INVOC(srp))
 				continue;
 			sr_st = atomic_read(&srp->rq_st);
 			if (sr_st == SG_RS_AWAIT_RD) {
@@ -3932,8 +4347,10 @@ sg_mk_srp(struct sg_fd *sfp, bool first)
 	if (srp) {
 		spin_lock_init(&srp->req_lck);
 		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		srp->sh_var =  SG_SHR_NONE;
 		srp->parentfp = sfp;
 		srp->tag = SG_TAG_WILDCARD;
+		srp->sgatp = &srp->sgat_h; /* only slave share changes sgatp */
 		return srp;
 	} else {
 		return ERR_PTR(-ENOMEM);
@@ -3962,7 +4379,7 @@ sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
  * Irrespective of the given reserve request size, the minimum size requested
  * will be PAGE_SIZE (often 4096 bytes). Returns a pointer to reserve object or
  * a negated errno value twisted by ERR_PTR() macro. The actual number of bytes
- * allocated (maybe less than buflen) is in srp->sgat_h.buflen . Note that this
+ * allocated (maybe less than buflen) is in srp->sgatp->buflen . Note that this
  * function is only called in contexts where locking is not required.
  */
 static struct sg_request *
@@ -4005,31 +4422,130 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
  * failure returns a negated errno value twisted by ERR_PTR() macro.
  */
 static struct sg_request *
-sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
+sg_add_request(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 {
 	bool act_empty = false;
+	bool allow_rsv = true;
 	bool mk_new_srp = true;
+	bool sl_req = false;
 	u32 sum_dlen;
 	unsigned long iflags;
 	enum sg_rq_state sr_st;
+	enum sg_rq_state ma_sr_st;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *rsv_srp;	/* current fd's reserve request */
+	struct sg_request *uninitialized_var(ma_rsv_srp);
+	struct sg_fd *uninitialized_var(ma_sfp);
 	__maybe_unused struct sg_device *sdp;
 	__maybe_unused const char *cp;
+	char b[48];
 
 	spin_lock_irqsave(&fp->rq_list_lock, iflags);
+	b[0] = '\0';
 	sdp = fp->parentdp;
 	rsv_srp = fp->rsv_srp;
+
+	switch (sh_var) {
+	case SG_SHR_NONE:
+	case SG_SHR_SL_FD_NOT_RQ:
+		break;
+	case SG_SHR_MA_RQ:
+		sr_st = atomic_read(&rsv_srp->rq_st);
+		if (sr_st == SG_RS_INACTIVE) {
+			if (likely(sg_rstate_chg(rsv_srp, sr_st,
+						 SG_RS_BUSY) == 0)) {
+				r_srp = rsv_srp;
+				mk_new_srp = false;
+				cp = "ma_rq";
+				goto good_fini;
+			}
+		}
+		r_srp = ERR_PTR(-EBUSY);
+		break;
+	case SG_SHR_MA_FD_NOT_RQ:
+		allow_rsv = false;
+		break;
+	case SG_SHR_SL_RQ:
+		ma_sfp = fp->share_sfp;
+		if (!ma_sfp) {
+			r_srp = ERR_PTR(-EPROTO);
+			break;
+		}
+		/*
+		 * Contention here may be with another potential slave trying
+		 * to pair with this master. The loser will receive an
+		 * EADDRINUSE errno. The winner advances master's rq_state:
+		 *     SG_RS_SHR_SWAP --> SG_RS_SHR_SLAVE
+		 */
+		ma_rsv_srp = ma_sfp->rsv_srp;
+		ma_sr_st = atomic_read(&ma_rsv_srp->rq_st);
+		switch (ma_sr_st) {
+		case SG_RS_AWAIT_RD:
+		case SG_RS_DONE_RD:
+			if (ma_rsv_srp->rq_result & SG_ML_RESULT_MSK) {
+				r_srp = ERR_PTR(-ENOSTR);
+				break;
+			}
+			/* fallthrough */
+		case SG_RS_SHR_SWAP:
+			sl_req = true;
+			if (ma_sr_st == SG_RS_AWAIT_RD)
+				break;
+			if (unlikely(sg_rstate_chg(ma_rsv_srp, ma_sr_st,
+						   SG_RS_SHR_SLAVE) != 0))
+				r_srp = ERR_PTR(-EADDRINUSE);
+			break;
+		case  SG_RS_INFLIGHT:
+			sl_req = true;
+			break;
+		case SG_RS_INACTIVE:
+			r_srp = ERR_PTR(-EADDRNOTAVAIL);
+			break;
+		case SG_RS_BUSY:
+			r_srp = ERR_PTR(-EBUSY);
+			break;
+		case SG_RS_SHR_SLAVE:
+		default:
+			r_srp = ERR_PTR(-EADDRINUSE);
+			break;
+		}
+		break;
+	}
+	if (IS_ERR(r_srp)) {
+		if (sh_var == SG_SHR_MA_RQ)
+			snprintf(b, sizeof(b), "SG_SHR_MA_RQ --> sr_st=%s",
+				 sg_rq_st_str(sr_st, false));
+		else if (sh_var == SG_SHR_SL_RQ && ma_sfp)
+			snprintf(b, sizeof(b), "SG_SHR_SL_RQ-->ma_sr_st=%s",
+				 sg_rq_st_str(ma_sr_st, false));
+		else
+			snprintf(b, sizeof(b), "sh_var=%s",
+				 sg_shr_str(sh_var, false));
+		goto err_out;
+	}
 	cp = "";
+	if (sl_req) {	/* slave dlen may be smaller than master's dlen */
+		if (dxfr_len > ma_rsv_srp->sgatp->dlen) {
+			SG_LOG(4, sdp, "%s: slave dlen [%d] > master dlen\n",
+			       __func__, dxfr_len);
+			r_srp = ERR_PTR(-E2BIG);
+			goto err_out;
+		}
+		dxfr_len = 0;	/* any srp for slave will do, pick smallest */
+	}
+
 	/*
-	 * Check the free list (fl) for candidates. Pick zero data length
+	 * Check the free list (fl) for candidates. If SG_SHR_MA_FD_NOT_RQ
+	 * then do not re-use the reserve request. Pick zero data length
 	 * requests from the back of the fl, the rest from the front.
 	 */
-	if (list_empty(&fp->rq_fl)) {
+	if (list_empty(&fp->rq_fl))
 		act_empty = true;
-	} else if (dxfr_len < 1) {  /* 0 data length requests at back of fl */
+	else if (dxfr_len < 1) {    /* 0 data length requests at back of fl */
 		list_for_each_entry_reverse(r_srp, &fp->rq_fl, fl_entry) {
+			if (!allow_rsv && rsv_srp == r_srp)
+				continue;
 			sr_st = atomic_read(&r_srp->rq_st);
 			if (sr_st == SG_RS_INACTIVE) {
 				if (likely(sg_rstate_chg(r_srp, sr_st,
@@ -4042,9 +4558,11 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		}
 	} else { /*     find request with large enough dlen */
 		list_for_each_entry(r_srp, &fp->rq_fl, fl_entry) {
+			if (!allow_rsv && rsv_srp == r_srp)
+				continue;
 			sr_st = atomic_read(&r_srp->rq_st);
 			if (sr_st == SG_RS_INACTIVE &&
-			    r_srp->sgat_h.buflen >= dxfr_len) {
+			    r_srp->sgatp->buflen >= dxfr_len) {
 				if (likely(sg_rstate_chg(r_srp, sr_st,
 							 SG_RS_BUSY) == 0)) {
 					cp = "from front of fl";
@@ -4055,6 +4573,7 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		}
 	}
 
+good_fini:
 	if (mk_new_srp) {	/* Need new sg_request object */
 		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
 
@@ -4090,20 +4609,27 @@ sg_add_request(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	if (!mk_new_srp)
 		spin_lock(&r_srp->req_lck);
 	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
-	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->sgatp->dlen = dxfr_len;/* must be <= r_srp->sgatp->buflen */
+	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
 	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
 		r_srp->start_ns = U64_MAX;
+	if (sl_req && ma_rsv_srp) {
+		ma_sfp->slave_srp = r_srp;
+		/* slave "shares" the master reserve request's data buffer */
+		r_srp->sgatp = &ma_rsv_srp->sgat_h;
+	}
 	if (mk_new_srp)
 		spin_lock_irqsave(&fp->rq_list_lock, iflags);
 	list_add_tail_rcu(&r_srp->rq_entry, &fp->rq_list);
 	if (!mk_new_srp)
 		spin_unlock(&r_srp->req_lck);
+err_out:
 	spin_unlock_irqrestore(&fp->rq_list_lock, iflags);
 err_no_lock:
-	if (IS_ERR(r_srp))
-		SG_LOG(1, sdp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
+	if (IS_ERR(r_srp) && b[0])
+		SG_LOG(1, sdp, "%s: bad %s\n", __func__, b);
 	if (!IS_ERR(r_srp))
 		SG_LOG(4, sdp, "%s: %s %sr_srp=0x%p\n", __func__, cp,
 		       ((r_srp == rsv_srp) ? "[rsv] " : ""), r_srp);
@@ -4122,10 +4648,11 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	bool on_fl = false;
 	int dlen, buflen;
 	unsigned long iflags;
+	enum sg_rq_state sr_st;
 	u8 *sbp;
 	struct sg_request *t_srp;
 	struct sg_scatter_hold *schp;
-	const char *cp = "head";
+	__maybe_unused const char *cp = "head";
 
 	if (WARN_ON(!sfp || !srp))
 		return;
@@ -4133,7 +4660,9 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
-	atomic_set(&srp->rq_st, SG_RS_BUSY);
+	sr_st = atomic_read(&srp->rq_st);
+	if (sr_st != SG_RS_SHR_SWAP) /* mark _BUSY then _INACTIVE at end */
+		atomic_set(&srp->rq_st, SG_RS_BUSY);
 	list_del_rcu(&srp->rq_entry);
 	kfree(sbp);	/* maybe orphaned req, thus never read */
 	/*
@@ -4166,7 +4695,10 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	SG_LOG(5, sfp->parentdp, "%s: %ssrp=0x%p move to fl %s\n", __func__,
 	       ((sfp->rsv_srp == srp) ? "rsv " : ""), srp, cp);
 	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+	if (sr_st != SG_RS_SHR_SWAP) {
+		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		srp->tag = SG_TAG_WILDCARD;
+	}
 }
 
 /* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
@@ -4218,6 +4750,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		SG_LOG(1, sdp, "%s: detaching\n", __func__);
 		return ERR_PTR(-ENODEV);
 	}
+	sg_unshare_fds(sfp, NULL);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
@@ -4231,11 +4764,11 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 			       -err);
 			return ERR_PTR(err);
 		}
-		if (srp->sgat_h.buflen < rbuf_len) {
+		if (srp->sgatp->buflen < rbuf_len) {
 			reduced = true;
 			SG_LOG(2, sdp,
 			       "%s: reserve reduced from %d to buflen=%d\n",
-			       __func__, rbuf_len, srp->sgat_h.buflen);
+			       __func__, rbuf_len, srp->sgatp->buflen);
 		}
 		/* will be first element so head or tail doesn't matter */
 		list_add_tail_rcu(&srp->fl_entry, &sfp->rq_fl);
@@ -4284,7 +4817,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			continue;
 		sg_finish_scsi_blk_rq(srp);
 		list_del(&srp->rq_entry);
-		if (srp->sgat_h.buflen > 0)
+		if (srp->sgatp->buflen > 0)
 			sg_remove_sgat(srp);
 		kfree(srp->sense_bp);	/* abnormal close: device detached */
 		SG_LOG(6, sdp, "%s:%s%p --\n", __func__, cp, srp);
@@ -4296,7 +4829,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		if (!srp)
 			continue;
 		list_del(&srp->fl_entry);
-		if (srp->sgat_h.buflen > 0)
+		if (srp->sgatp->buflen > 0)
 			sg_remove_sgat(srp);
 		kfree(srp->sense_bp);
 		SG_LOG(6, sdp, "%s: fl%s%p --\n", __func__, cp, srp);
@@ -4318,7 +4851,14 @@ sg_remove_sfp(struct kref *kref)
 	unsigned long iflags;
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
 	struct sg_device *sdp = sfp->parentdp;
+	struct sg_fd *o_sfp = (sfp->shr_fd != SG_SHARE_FD_UNUSED) ?
+				sfp->share_sfp : NULL;
 
+	if (o_sfp && !test_bit(SG_FFD_RELEASE, o_sfp->ffd_bm)) {
+		mutex_lock(&o_sfp->f_mutex);
+		sg_remove_sfp_share(sfp);
+		mutex_unlock(&o_sfp->f_mutex);
+	}
 	write_lock_irqsave(&sdp->sfd_llock, iflags);
 	list_del(&sfp->sfd_entry);
 	write_unlock_irqrestore(&sdp->sfd_llock, iflags);
@@ -4327,9 +4867,9 @@ sg_remove_sfp(struct kref *kref)
 	schedule_work(&sfp->ew_fd.work);
 }
 
-/* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
+	__must_hold(&sg_index_lock)
 {
 	return idr_find(&sg_index_idr, dev);
 }
@@ -4374,19 +4914,50 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 		return long_str ? "done_read" : "fin";
 	case SG_RS_BUSY:
 		return long_str ? "busy" : "bsy";
+	case SG_RS_SHR_SWAP:	/* only an active master has this */
+		return long_str ? "share swap" : "s_wp";
+	case SG_RS_SHR_SLAVE:	/* only an active master has this */
+		return long_str ? "share slave active" : "sl_a";
+	default:
+		return long_str ? "unknown" : "unk";
+	}
+}
+
+static const char *
+sg_shr_str(enum sg_shr_var sh_var, bool long_str)
+{
+	switch (sh_var) {	/* share variety of request */
+	case SG_SHR_NONE:
+		return long_str ? "none" :  "-";
+	case SG_SHR_MA_RQ:
+		return long_str ? "master request" :  "m_rq";
+	case SG_SHR_MA_FD_NOT_RQ:
+		return long_str ? "master fd, not request" :  "m_nr";
+	case SG_SHR_SL_RQ:
+		return long_str ? "slave request" :  "s_rq";
+	case SG_SHR_SL_FD_NOT_RQ:
+		return long_str ? "slave fd, not request" :  "s_nr";
 	default:
 		return long_str ? "unknown" : "unk";
 	}
 }
+
 #else
+
 static const char *
 sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 {
 	return "";
 }
+
+static const char *
+sg_shr_str(enum sg_shr_var sh_var, bool long_str)
+{
+	return "";
+}
 #endif
 
-#if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)	/* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
 static int sg_proc_single_open_adio(struct inode *inode, struct file *filp);
@@ -4402,8 +4973,9 @@ static const struct file_operations adio_fops = {
 };
 
 static int sg_proc_single_open_dressz(struct inode *inode, struct file *filp);
-static ssize_t sg_proc_write_dressz(struct file *filp, 
-		const char __user *buffer, size_t count, loff_t *off);
+static ssize_t sg_proc_write_dressz(struct file *filp,
+				    const char __user *buffer, size_t count,
+				    loff_t *off);
 static const struct file_operations dressz_fops = {
 	.owner = THIS_MODULE,
 	.open = sg_proc_single_open_dressz,
@@ -4416,6 +4988,7 @@ static const struct file_operations dressz_fops = {
 static int sg_proc_seq_show_version(struct seq_file *s, void *v);
 static int sg_proc_seq_show_devhdr(struct seq_file *s, void *v);
 static int sg_proc_seq_show_dev(struct seq_file *s, void *v);
+
 static void * dev_seq_start(struct seq_file *s, loff_t *pos);
 static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos);
 static void dev_seq_stop(struct seq_file *s, void *v);
@@ -4534,7 +5107,7 @@ sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 		sg_big_buff = k;
 		return count;
 	}
-	return -ERANGE;
+	return -EDOM;
 }
 
 static int
@@ -4553,8 +5126,8 @@ sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 }
 
 struct sg_proc_deviter {
-	loff_t	index;
-	size_t	max;
+	loff_t index;
+	size_t max;
 	int fd_index;
 };
 
@@ -4569,7 +5142,7 @@ dev_seq_start(struct seq_file *s, loff_t *pos)
 
 	it->index = *pos;
 	it->max = sg_last_dev();
-	if (it->index >= it->max)
+	if (it->index >= (int)it->max)
 		return NULL;
 	return it;
 }
@@ -4605,11 +5178,11 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 		scsidp = sdp->device;
 		seq_printf(s, "%d\t%d\t%d\t%llu\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
-			      scsidp->id, scsidp->lun, (int) scsidp->type,
+			      scsidp->id, scsidp->lun, (int)scsidp->type,
 			      1,
-			      (int) scsidp->queue_depth,
-			      (int) atomic_read(&scsidp->device_busy),
-			      (int) scsi_device_online(scsidp));
+			      (int)scsidp->queue_depth,
+			      (int)atomic_read(&scsidp->device_busy),
+			      (int)scsi_device_online(scsidp));
 	}
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
@@ -4661,8 +5234,8 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 	rq_st = atomic_read(&srp->rq_st);
 	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
 	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
-		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
-		       srp->sgat_h.buflen, (int)srp->pack_id);
+		       sg_rq_st_str(rq_st, false), srp->sgatp->dlen,
+		       srp->sgatp->buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
@@ -4673,8 +5246,11 @@ sg_proc_dbg_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
 			       to / 1000, dur, tp);
 	}
+	if (srp->sh_var != SG_SHR_NONE)
+		n += scnprintf(obp + n, len - n, " shr=%s",
+			       sg_shr_str(srp->sh_var, false));
 	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
-		       srp->sgat_h.num_sgat, srp->cmd_opcode);
+		       srp->sgatp->num_sgat, srp->cmd_opcode);
 	return n;
 }
 
@@ -4686,16 +5262,22 @@ sg_proc_dbg_fd(struct sg_fd *fp, char *obp, int len, bool reduced)
 	bool first_fl;
 	int n = 0;
 	int to;
+	const char *cp;
 	struct sg_request *srp;
 
+	if (fp->shr_fd == SG_SHARE_FD_UNUSED)
+		cp = "";
+	else
+		cp = (fp->shr_fd == SG_SHARE_FD_MASTER) ?
+			" shr_mast" : " shr_slv";
 	/* sgat=-1 means unavailable */
 	to = jiffies_to_msecs(fp->timeout);
 	if (to % 1000)
 		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
 	else
 		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
-	n += scnprintf(obp + n, len - n, "v_buflen=%d\n   cmd_q=%d ",
-		       fp->rsv_srp->sgat_h.buflen,
+	n += scnprintf(obp + n, len - n, "v_buflen=%d%s\n   cmd_q=%d ",
+		       fp->rsv_srp->sgatp->buflen, cp,
 		       (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm));
 	n += scnprintf(obp + n, len - n,
 		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
@@ -4803,10 +5385,10 @@ sg_proc_seq_show_dbg(struct seq_file *s, void *v, bool reduced)
 	if (!list_empty(&sdp->sfds)) {
 		found = true;
 		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
-		if (atomic_read(&sdp->detaching))
+		if (atomic_read(&sdp->detaching)) {
 			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
 				 disk_name, "detaching pending close\n");
-		else if (sdp->device) {
+		} else if (sdp->device) {
 			n = sg_proc_dbg_sdev(sdp, bp, bp_len, fdi_p, reduced);
 			if (n >= bp_len - 1) {
 				trunc = true;
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index a3fa26644496..19d7321e7df6 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -110,6 +110,9 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+#define SGV4_FLAG_DEV_SCOPE 0x1000 /* permit SG_IOABORT to have wider scope */
+#define SGV4_FLAG_SHARE 0x2000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
+#define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER	/* needed for sharing */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
@@ -181,7 +184,12 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_IS_SHARE	0x20	/* rd: fd is master or slave share */
+#define SG_CTL_FLAGM_IS_MASTER	0x40	/* rd: this fd is share master */
 #define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
+/* rd> 1: master finished 0: not; wr> 1: finish share post master */
+#define SG_CTL_FLAGM_MASTER_FINI 0x100	/* wr> 0: setup for repeat slave req */
+#define SG_CTL_FLAGM_MASTER_ERR 0x200	/* rd: sharing, master got error */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */

From patchwork Fri May 24 18:48:07 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960347
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4FAE933
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:46 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A3D8F28A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:46 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 9802528A25; Fri, 24 May 2019 18:48:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 010B728A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391674AbfEXSso (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:44 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56431 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391503AbfEXSso (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:44 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id 1A186204162;
        Fri, 24 May 2019 20:48:42 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id 5-HtJXPAkSQT; Fri, 24 May 2019 20:48:39 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id 87F0D204190;
        Fri, 24 May 2019 20:48:31 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 17/19] sg: add multiple request support
Date: Fri, 24 May 2019 14:48:07 -0400
Message-Id: <20190524184809.25121-18-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Before the write() and read() system calls were removed from
the bsg driver (around lk 4.15) bsg supported multiple SCSI
requests being submitted in a single invocation. It did this
by passing an array of struct sg_io_v4 objects to the write()
whose third argument (the size the second argument points to)
is then a multiple of sizeof(sg_io_v4).

Doing the same with ioctl(SG_IOSUBMIT) is not practical since
with an ioctl() there is no "length of passed object" argument.
Further the __IOWR macro used to generate the ioctl number for
SG_IOSUBMIT encodes the expected length of the passed object,
and that is the size of a _single_ struct sg_io_v4 object.
So an indirect approach is taken: any object passed to
ioctl(SG_IO), ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE) with
SGV4_FLAG_MULTIPLE_REQS set is interpreted as a "controlling
object". It is parsed differently from other struct sg_io_v4
objects. Its data-out buffer contains an array of "normal"
struct sg_io_v4 objects.

Multiple requests can be combined with shared file
descriptors with SGV4_FLAG_DO_ON_OTHER indicating the other
file descriptor (in the share) is to be used for the
command it appears with. Multiple requests can be combined
with shared requests.

As a further optimisation, an array of SCSI commands can
be passed from the user space via the controlling object's
request "pointer". Without that, the multiple request
logic would need to visit the user space once per command
to pick up each SCSI command (cdb).

See the webpage at: http://sg.danny.cz/sg/sg_v40.html
in the section titled: "9 Multiple requests"

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 616 ++++++++++++++++++++++++++++++++++++++++-
 include/uapi/scsi/sg.h |  11 +-
 2 files changed, 624 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 9adca3988c58..92a0226cabc2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -62,6 +62,7 @@ static char *sg_version_date = "20190520";
 #define SG_ALLOW_DIO_DEF 0
 
 #define SG_MAX_DEVS 32768
+#define SG_MAX_MULTI_REQ_SZ (2 * 1024 * 1024)
 
 /* Comment out the following line to compile out SCSI_LOGGING stuff */
 #define SG_DEBUG 1
@@ -132,6 +133,7 @@ enum sg_shr_var {
 #define SG_FRQ_NO_US_XFER	4	/* no user space transfer of data */
 #define SG_FRQ_ABORTING		5	/* in process of aborting this cmd */
 #define SG_FRQ_DEACT_ORPHAN	7	/* not keeping orphan so de-activate */
+#define SG_FRQ_MULTI_REQ	8	/* part of a multiple request series */
 #define SG_FRQ_BLK_PUT_REQ	9	/* set when blk_put_request() called */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
@@ -196,6 +198,7 @@ struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
 	s16 dir;		/* data xfer direction; SG_DXFER_*  */
 	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
 	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
+	u16 mrq_ind;		/* position in parentfp->mrq_arr */
 };
 
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
@@ -311,6 +314,10 @@ static int sg_v3_receive(struct sg_fd *sfp, struct sg_request *srp,
 static int sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
+static int sg_wait_event_srp(struct sg_fd *sfp, void __user *p,
+			     struct sg_io_v4 *h4p, struct sg_request *srp);
+static int sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp,
+			 void __user *p, struct sg_io_v4 *h4p);
 static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
@@ -318,6 +325,7 @@ static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
+static bool sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp);
 static struct sg_request *sg_add_request(struct sg_comm_wr_t *cwrp,
 					 enum sg_shr_var sh_var, int dxfr_len);
 static int sg_rq_map_kern(struct sg_request *srp, struct request_queue *q,
@@ -765,6 +773,8 @@ sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	struct sg_comm_wr_t cwr;
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
+	if (hp->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return -ERANGE;		/* need to use v4 interface */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		if (!list_empty(&sfp->rq_list))
 			return -EBUSY;  /* already active requests on fd */
@@ -792,6 +802,432 @@ sg_v3_submit(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	return 0;
 }
 
+static void
+sg_sgv4_out_zero(struct sg_io_v4 *h4p)
+{
+	h4p->driver_status = 0;
+	h4p->transport_status = 0;
+	h4p->device_status = 0;
+	h4p->retry_delay = 0;
+	h4p->info = 0;
+	h4p->response_len = 0;
+	h4p->duration = 0;
+	h4p->din_resid = 0;
+	h4p->dout_resid = 0;
+	h4p->generated_tag = 0;
+	h4p->spare_out = 0;
+}
+
+/*
+ * Takes a pointer to the controlling multiple request (mrq) object and a
+ * pointer to the command array. The command array (with tot_reqs elements)
+ * is written out (flushed) to user space pointer cv4p->din_xferp. The
+ * secondary error value (s_res) is placed in the array last element's
+ * spare_out field iff it was zero. Returns 0 on success.
+ */
+static int
+sg_mrq_arr_flush(struct sg_io_v4 *cv4p, struct sg_io_v4 *arr_siv4p,
+		 u32 tot_reqs, int s_res)
+{
+	u32 sz = tot_reqs * SZ_SG_IO_V4;
+	void __user *p = uptr64(cv4p->din_xferp);
+
+	if (s_res) {
+		arr_siv4p[tot_reqs - 1].spare_out = s_res;
+		cv4p->spare_out = s_res;
+	}
+	if (!p)
+		return 0;
+	if (sz > cv4p->din_xfer_len)
+		sz = cv4p->din_xfer_len;
+	if (sz > 0) {
+		if (copy_to_user(p, arr_siv4p, sz))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+static int
+sg_mrq_1complet(struct sg_io_v4 *cv4p, struct sg_io_v4 *arr_siv4p,
+		struct sg_fd *w_sfp, struct sg_fd *sec_sfp, int tot_reqs,
+		struct sg_request *srp)
+{
+	int s_res, indx;
+	struct sg_io_v4 *siv4p;
+
+	SG_LOG(3, w_sfp->parentdp, "%s: start\n", __func__);
+	if (!srp)
+		return -EPROTO;
+	indx = srp->s_hdr4.mrq_ind;
+	if (indx < 0 || indx >= tot_reqs)
+		return -EPROTO;
+	siv4p = arr_siv4p + indx;
+	s_res = sg_v4_receive(w_sfp, srp, NULL, siv4p);
+	if (s_res == -EFAULT)
+		return s_res;
+	siv4p->info |= SG_INFO_MRQ_FINI;
+	if (siv4p->flags & SGV4_FLAG_SIG_ON_OTHER) {
+		s_res = sg_mrq_arr_flush(cv4p, arr_siv4p, tot_reqs, s_res);
+		if (unlikely(s_res))	/* can only be -EFAULT */
+			return s_res;
+		kill_fasync(&sec_sfp->async_qp, SIGPOLL, POLL_IN);
+	}
+	return 0;
+}
+
+/*
+ * This is a fair-ish algorithm for an interruptible wait on two file
+ * descriptors. It favours the main fd over the secondary fd (sec_sfp).
+ */
+static int
+sg_mrq_complets(struct sg_io_v4 *cv4p, struct sg_io_v4 *arr_siv4p,
+		struct sg_fd *sfp, struct sg_fd *sec_sfp, int tot_reqs,
+		int mreqs, int sec_reqs)
+{
+	int res;
+	int sum_inflight = mreqs + sec_reqs;	/* may be < tot_reqs */
+	struct sg_request *srp;
+
+	SG_LOG(3, sfp->parentdp, "%s: mreqs=%d, sec_reqs=%d\n", __func__,
+	       mreqs, sec_reqs);
+	for ( ; sum_inflight > 0; --sum_inflight) {
+		srp = NULL;
+		if (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
+			if (IS_ERR(srp)) {	/* -ENODATA: no mrqs here */
+				mreqs = 0;
+			} else {
+				--mreqs;
+				res = sg_mrq_1complet(cv4p, arr_siv4p, sfp,
+						      sec_sfp, tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (sec_reqs > 0 &&
+			   sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+			if (IS_ERR(srp)) {
+				sec_reqs = 0;
+			} else {
+				--sec_reqs;
+				res = sg_mrq_1complet(cv4p, arr_siv4p, sec_sfp,
+						      sec_sfp, tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (mreqs > 0) {
+			res = wait_event_interruptible
+					(sfp->read_wait,
+					 sg_mrq_get_ready_srp(sfp, &srp));
+			if (unlikely(res))
+				return res;	/* signal --> -ERESTARTSYS */
+			if (IS_ERR(srp)) {
+				mreqs = 0;
+			} else {
+				--mreqs;
+				res = sg_mrq_1complet(cv4p, arr_siv4p, sfp,
+						      sec_sfp, tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (sec_reqs > 0) {
+			res = wait_event_interruptible
+					(sfp->read_wait,
+					 sg_mrq_get_ready_srp(sec_sfp, &srp));
+			if (unlikely(res))
+				return res;	/* signal --> -ERESTARTSYS */
+			if (IS_ERR(srp)) {
+				sec_reqs = 0;
+			} else {
+				--sec_reqs;
+				res = sg_mrq_1complet(cv4p, arr_siv4p, sec_sfp,
+						      sec_sfp, tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else { /* expect one of the above conditions to be true */
+			return -EPROTO;
+		}
+		if (cv4p->din_xfer_len > 0)
+			--cv4p->din_resid;
+	}
+	return 0;
+}
+
+static int
+sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cv4p,
+	      struct sg_io_v4 *arr_siv4p, u8 *cdb_ap, struct sg_fd *o_sfp,
+	      u8 tot_reqs)
+{
+	bool immed = !!(cv4p->flags & SGV4_FLAG_IMMED);
+	bool have_mrq_sense = (cv4p->response && cv4p->max_response_len);
+	int k;
+	u32 cdb_alen = cv4p->request_len;
+	u32 cdb_mxlen = cdb_alen / tot_reqs;
+	u32 flags;
+	struct sg_io_v4 *siv4p;
+	__maybe_unused const char *rip = "request index";
+
+	/* Pre-check each request for anomalies */
+	for (k = 0, siv4p = arr_siv4p; k < tot_reqs; ++k, ++siv4p) {
+		flags = siv4p->flags;
+		sg_sgv4_out_zero(siv4p);
+		if (siv4p->guard != 'Q' || siv4p->protocol != 0 ||
+		    siv4p->subprotocol != 0) {
+			SG_LOG(1, sdp, "%s: req index %u: %s or protocol\n",
+			       __func__, k, "bad guard");
+			return -ERANGE;
+		}
+		if (flags & SGV4_FLAG_MULTIPLE_REQS) {
+			SG_LOG(1, sdp, "%s: %s %u: no nested multi-reqs\n",
+			       __func__, rip, k);
+			return -ERANGE;
+		}
+		if (immed) {	/* only accept async submits on current fd */
+			if (flags & SGV4_FLAG_DO_ON_OTHER) {
+				SG_LOG(1, sdp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with ON_OTHER");
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_SHARE) {
+				SG_LOG(1, sdp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with FLAG_SHARE");
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_COMPLETE_B4) {
+				SG_LOG(1, sdp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with COMPLETE_B4");
+				return -ERANGE;
+			}
+		}
+		if (!o_sfp) {
+			if (flags & SGV4_FLAG_SHARE) {
+				SG_LOG(1, sdp, "%s: %s %u, no share\n",
+				       __func__, rip, k);
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_SIG_ON_OTHER) {
+				SG_LOG(1, sdp, "%s: %s %u, %s signal on\n",
+				       __func__, rip, k, "no other fd to");
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_DO_ON_OTHER) {
+				SG_LOG(1, sdp, "%s: %s %u, %s do on\n",
+				       __func__, rip, k, "no other fd to");
+				return -ERANGE;
+			}
+		}
+		if (cdb_ap) {
+			if (siv4p->request_len > cdb_mxlen) {
+				SG_LOG(1, sdp, "%s: %s %u, cdb too long\n",
+				       __func__, rip, k);
+				return -ERANGE;
+			}
+		}
+		if (have_mrq_sense && siv4p->response == 0 &&
+		    siv4p->max_response_len == 0) {
+			siv4p->response = cv4p->response;
+			siv4p->max_response_len = cv4p->max_response_len;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Implements the multiple request functionality. When blocking is true
+ * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
+ * Only fully non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
+ * is used with O_NONBLOCK set on its file descriptor.
+ */
+static int
+sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
+{
+	bool set_this, set_other, immed, stop_if, f_non_block;
+	int res = 0;
+	int s_res = 0;	/* for partial error, some good then ... */
+	int other_fp_sent = 0;
+	int this_fp_sent = 0;
+	const int shr_complet_b4 = SGV4_FLAG_SHARE | SGV4_FLAG_COMPLETE_B4;
+	unsigned long ul_timeout;
+	struct sg_io_v4 *cv4p = cwrp->h4p;
+	u32 k, n, flags, cdb_mxlen;
+	u32 blen = cv4p->dout_xfer_len;
+	u32 cdb_alen = cv4p->request_len;
+	u32 tot_reqs = blen / SZ_SG_IO_V4;
+	struct sg_io_v4 *siv4p;
+	u8 *cdb_ap = NULL;
+	struct sg_io_v4 *arr_siv4p;
+	struct sg_fd *fp = cwrp->sfp;
+	struct sg_fd *o_sfp = fp->share_sfp;
+	struct sg_fd *rq_sfp;
+	struct sg_request *srp;
+	struct sg_device *sdp = fp->parentdp;
+
+	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
+	immed = !!(cv4p->flags & SGV4_FLAG_IMMED);
+	stop_if = !!(cv4p->flags & SGV4_FLAG_STOP_IF);
+	if (blocking) {		/* came from ioctl(SG_IO) */
+		if (unlikely(immed)) {
+			SG_LOG(1, sdp, "%s: ioctl(SG_IO) %s contradicts\n",
+			       __func__, "with SGV4_FLAG_IMMED");
+			return -ERANGE;
+		}
+		if (unlikely(f_non_block)) {
+			SG_LOG(6, sdp, "%s: ioctl(SG_IO) %s O_NONBLOCK\n",
+			       __func__, "ignoring");
+			f_non_block = false;
+		}
+	}
+	if (!immed && f_non_block)
+		immed = true;
+	SG_LOG(3, sdp, "%s: %s, tot_reqs=%u, cdb_alen=%u\n", __func__,
+	       (immed ? "IMMED" : (blocking ?  "ordered blocking" :
+				   "variable blocking")), tot_reqs, cdb_alen);
+	sg_sgv4_out_zero(cv4p);
+
+	if (unlikely(tot_reqs > U16_MAX)) {
+		return -ERANGE;
+	} else if (unlikely(blen > SG_MAX_MULTI_REQ_SZ ||
+			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
+		return  -E2BIG;
+	} else if (unlikely(immed && stop_if)) {
+		return -ERANGE;
+	} else if (unlikely(tot_reqs == 0)) {
+		return 0;
+	} else if (unlikely(!!cdb_alen != !!cv4p->request)) {
+		return -ERANGE;	/* both must be zero or both non-zero */
+	} else if (cdb_alen) {
+		if (unlikely(cdb_alen % tot_reqs))
+			return -ERANGE;
+		cdb_mxlen = cdb_alen / tot_reqs;
+		if (unlikely(cdb_mxlen < 6))
+			return -ERANGE;	/* too short for SCSI cdbs */
+	} else {
+		cdb_mxlen = 0;
+	}
+
+	if (unlikely(atomic_read(&sdp->detaching)))
+		return -ENODEV;
+	else if (unlikely(o_sfp && atomic_read(&o_sfp->parentdp->detaching)))
+		return -ENODEV;
+
+	arr_siv4p = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
+	if (!arr_siv4p)
+		return -ENOMEM;
+	n = tot_reqs * SZ_SG_IO_V4;
+	if (copy_from_user(arr_siv4p, cuptr64(cv4p->dout_xferp), n)) {
+		res = -EFAULT;
+		goto fini;
+	}
+	if (cdb_alen > 0) {
+		cdb_ap = kcalloc(tot_reqs, cdb_alen / tot_reqs,
+				 GFP_KERNEL | __GFP_NOWARN);
+		if (unlikely(!cdb_ap)) {
+			res = -ENOMEM;
+			goto fini;
+		}
+		if (copy_from_user(cdb_ap, cuptr64(cv4p->request), cdb_alen)) {
+			res = -EFAULT;
+			goto fini;
+		}
+	}
+	/* do sanity checks on all requests before starting */
+	res = sg_mrq_sanity(sdp, cv4p, arr_siv4p, cdb_ap, o_sfp, tot_reqs);
+	if (unlikely(res))
+		goto fini;
+	set_this = false;
+	set_other = false;
+	/* Dispatch requests and optionally wait for response */
+	for (k = 0, siv4p = arr_siv4p; k < tot_reqs; ++k, ++siv4p) {
+		flags = siv4p->flags;
+		if (flags & SGV4_FLAG_DO_ON_OTHER) {
+			rq_sfp = o_sfp;
+			if (!set_other) {
+				set_other = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		} else {
+			rq_sfp = fp;
+			if (!set_this) {
+				set_this = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		}
+		if (cdb_ap) {	/* already have array of cdbs */
+			cwrp->cmdp = cdb_ap + (k * cdb_mxlen);
+			cwrp->u_cmdp = NULL;
+		} else {	/* fetch each cdb from user space */
+			cwrp->cmdp = NULL;
+			cwrp->u_cmdp = cuptr64(siv4p->request);
+		}
+		cwrp->cmd_len = siv4p->request_len;
+		ul_timeout = msecs_to_jiffies(siv4p->timeout);
+		cwrp->frq_bm[0] = 0;
+		assign_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm, (int)blocking);
+		set_bit(SG_FRQ_IS_V4I, cwrp->frq_bm);
+		set_bit(SG_FRQ_MULTI_REQ, cwrp->frq_bm);
+		cwrp->h4p = siv4p;
+		cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+		cwrp->sfp = rq_sfp;
+		srp = sg_common_write(cwrp);
+		if (IS_ERR(srp)) {
+			s_res = PTR_ERR(srp);
+			break;
+		}
+		srp->s_hdr4.mrq_ind = k;
+		if (immed || (!(blocking || (flags & shr_complet_b4)))) {
+			if (fp == rq_sfp)
+				++this_fp_sent;
+			else
+				++other_fp_sent;
+			continue;  /* defer completion until all submitted */
+		}
+		s_res = sg_wait_event_srp(rq_sfp, NULL, siv4p, srp);
+		if (s_res) {
+			if (s_res == -ERESTARTSYS) {
+				res = s_res;
+				goto fini;
+			}
+			break;
+		}
+		if (!srp) {
+			s_res = -EPROTO;
+			break;
+		}
+		siv4p->info |= SG_INFO_MRQ_FINI;
+		if (stop_if && (siv4p->driver_status ||
+				siv4p->transport_status ||
+				siv4p->device_status)) {
+			SG_LOG(2, sdp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
+			       __func__, "STOP_IF and status [drv/tran/scsi",
+			       siv4p->driver_status, siv4p->transport_status,
+			       siv4p->device_status);
+			break;	/* cv4p::driver_status <-- 0 in this case */
+		}
+		if (siv4p->flags & SGV4_FLAG_SIG_ON_OTHER) {
+			res = sg_mrq_arr_flush(cv4p, arr_siv4p, tot_reqs,
+					       s_res);
+			if (unlikely(res))
+				break;
+			kill_fasync(&o_sfp->async_qp, SIGPOLL, POLL_IN);
+		}
+	}	/* end of dispatch request and optionally wait loop */
+	cv4p->dout_resid = tot_reqs - k;
+	cv4p->info = k;
+	if (cv4p->din_xfer_len > 0)
+		cv4p->din_resid = cv4p->din_xfer_len / SZ_SG_IO_V4;
+
+	if (immed)
+		goto fini;
+
+	if (res == 0 && (this_fp_sent + other_fp_sent) > 0) {
+		s_res = sg_mrq_complets(cv4p, arr_siv4p, fp, o_sfp, tot_reqs,
+					this_fp_sent, other_fp_sent);
+		if (s_res == -EFAULT || s_res == -ERESTARTSYS)
+			res = s_res;	/* this may leave orphans */
+	}
+fini:
+	if (res == 0 && !immed)
+		res = sg_mrq_arr_flush(cv4p, arr_siv4p, tot_reqs, s_res);
+	kfree(cdb_ap);
+	kfree(arr_siv4p);
+	return res;
+}
+
 static int
 sg_v4_submit(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	     bool sync, struct sg_request **o_srp)
@@ -804,6 +1240,24 @@ sg_v4_submit(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	memset(&cwr, 0, sizeof(cwr));
 	cwr.sfp = sfp;
 	cwr.h4p = h4p;
+	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
+		/* want v4 async or sync with guard, din and dout and flags */
+		if (!h4p->dout_xferp || h4p->din_iovec_count ||
+		    h4p->dout_iovec_count ||
+		    (h4p->dout_xfer_len % SZ_SG_IO_V4))
+			return -ERANGE;
+		if (o_srp)
+			*o_srp = NULL;
+		res = sg_do_multi_req(&cwr, sync);
+		if (unlikely(res))
+			return res;
+		if (p) {
+			/* Write back sg_io_v4 object for error/warning info */
+			if (copy_to_user(p, h4p, SZ_SG_IO_V4))
+				return -EFAULT;
+		}
+		return 0;
+	}
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
 		int len = 0;
 
@@ -1021,7 +1475,11 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 			dxfr_len = h4p->dout_xfer_len;
 			dir = SG_DXFER_TO_DEV;
 		}
-	} else {                /* sg v3 interface so hi_p valid */
+		/* Only allow SGV4_FLAG_DO_ON_OTHER on mrq */
+		if (!test_bit(SG_FRQ_MULTI_REQ, cwrp->frq_bm) &&
+		    (rq_flags & SGV4_FLAG_DO_ON_OTHER))
+			return ERR_PTR(-ERANGE);
+	} else {			/* sg v3 interface so hi_p valid */
 		h4p = NULL;
 		hi_p = cwrp->h3p;
 		dir = hi_p->dxfer_direction;
@@ -1029,6 +1487,8 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
+	if (rq_flags & SGV4_FLAG_MULTIPLE_REQS)
+		return ERR_PTR(-ERANGE);
 	if (fp->shr_fd == SG_SHARE_FD_UNUSED) {
 		/* no sharing established on this fd */
 		sh_var = SG_SHR_NONE;
@@ -1314,6 +1774,103 @@ sg_v4_receive(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	return err < 0 ? err : 0;
 }
 
+/*
+ * Returns negative on error including -ENODATA if there are no mrqs submitted
+ * nor waiting. Otherwise it returns the number of elements written to
+ * rsp_arr, which may be 0 if mrqs submitted but none waiting
+ */
+static int
+sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
+		      struct sg_io_v4 *rsp_arr)
+{
+	int k;
+	int res = 0;
+	struct sg_request *srp;
+
+	SG_LOG(3, sfp->parentdp, "%s: max_mrqs=%d\n", __func__, max_mrqs);
+	for (k = 0; k < max_mrqs; ++k) {
+		if (!sg_mrq_get_ready_srp(sfp, &srp))
+			break;
+		if (!srp)
+			return -EPROTO;
+		if (IS_ERR(srp))
+			return k ? k : PTR_ERR(srp);
+		res = sg_v4_receive(sfp, srp, NULL, rsp_arr + k);
+		if (unlikely(res))
+			return res;
+		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
+	}
+	if (non_block)
+		return k;
+
+	for ( ; k < max_mrqs; ++k) {
+		res = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_mrq_get_ready_srp(sfp, &srp));
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
+		if (unlikely(!srp))
+			return -EPROTO;
+		if (IS_ERR(srp))
+			return k ? k : PTR_ERR(srp);
+		res = sg_v4_receive(sfp, srp, NULL, rsp_arr + k);
+		if (unlikely(res))
+			return res;
+		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
+	}
+	return k;
+}
+
+/*
+ * Expected race as multiple concurrent calls with the same pack_id/tag can
+ * occur. Only one should succeed per request (more may succeed but will get
+ * different requests).
+ */
+static int
+sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cv4p, void __user *p,
+		 bool non_block)
+{
+	int res = 0;
+	u32 len, n;
+	struct sg_io_v4 *rsp_v4_arr;
+	void __user *pp;
+
+	SG_LOG(3, sfp->parentdp, "%s: non_block=%d\n", __func__, !!non_block);
+	n = cv4p->din_xfer_len;
+	if (n > SG_MAX_MULTI_REQ_SZ)
+		return -E2BIG;
+	if (!cv4p->din_xferp || n < SZ_SG_IO_V4 || (n % SZ_SG_IO_V4))
+		return -ERANGE;
+	n /= SZ_SG_IO_V4;
+	len = n * SZ_SG_IO_V4;
+	SG_LOG(3, sfp->parentdp, "%s: %s, num_reqs=%u\n", __func__,
+	       (non_block ? "IMMED" : "blocking"), n);
+	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
+	if (!rsp_v4_arr)
+		return -ENOMEM;
+
+	sg_sgv4_out_zero(cv4p);
+	cv4p->din_resid = n;
+	res = sg_mrq_iorec_complets(sfp, non_block, n, rsp_v4_arr);
+	if (unlikely(res < 0))
+		goto fini;
+	cv4p->din_resid -= res;
+	cv4p->info = res;
+	if (copy_to_user(p, cv4p, sizeof(*cv4p)))
+		return -EFAULT;
+	res = 0;
+	pp = uptr64(cv4p->din_xferp);
+	if (pp) {
+		if (copy_to_user(pp, rsp_v4_arr, len))
+			res = -EFAULT;
+	} else {
+		pr_info("%s: cv4p->din_xferp==NULL ?_?\n", __func__);
+	}
+fini:
+	kfree(rsp_v4_arr);
+	return res;
+}
+
 /*
  * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
  * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
@@ -1346,6 +1903,8 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	if (h4p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sdp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return sg_mrq_ioreceive(sfp, h4p, p, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
@@ -1397,6 +1956,8 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 	if (h3p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sdp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	if (h3p->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return -EINVAL;
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
@@ -4328,6 +4889,57 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	return srp;
 }
 
+/*
+ * Returns pointer to first non-sync_invoc req waiting to be read. If none
+ * found returns NULL unless there were no such reqs on active list, in which
+ * case it returns ERR_PTR(-ENODATA).
+ */
+static struct sg_request *
+sg_mrq_get_rq(struct sg_fd *sfp)
+{
+	bool any_rqs = false;
+	bool got1 = false;
+	enum sg_rq_state sr_st;
+	struct sg_request *srp;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(srp, &sfp->rq_list, rq_entry) {
+		if (SG_IS_SYNC_INVOC(srp))
+			continue;
+		any_rqs = true;		/* only count async reqs */
+		sr_st = atomic_read(&srp->rq_st);
+		if (sr_st == SG_RS_AWAIT_RD) {
+			if (likely(sg_rstate_chg(srp, sr_st,
+						 SG_RS_BUSY) == 0)) {
+				got1 = true;
+				break;
+			}
+		}
+	}
+	rcu_read_unlock();
+	if (got1)
+		return srp;
+	else if (any_rqs)
+		return NULL;
+	else
+		return ERR_PTR(-ENODATA);
+}
+
+/* Note, *srpp may not be a valid pointer, might be ERR_PTR(-ENODATA) */
+static bool
+sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	struct sg_request *srp;
+
+	if (unlikely(atomic_read(&sfp->parentdp->detaching))) {
+		*srpp = NULL;
+		return true;
+	}
+	srp = sg_mrq_get_rq(sfp);
+	*srpp = srp;
+	return !!srp;
+}
+
 /*
  * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
  * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
@@ -5434,7 +6046,7 @@ sg_proc_seq_show_red_dbg(struct seq_file *s, void *v)
 	return sg_proc_seq_show_dbg(s, v, true);
 }
 
-#endif				/* CONFIG_SCSI_PROC_FS (~600 lines back) */
+#endif				/* CONFIG_SCSI_PROC_FS (~500 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 19d7321e7df6..b299e7d1b51d 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -31,7 +31,11 @@
 #include <linux/types.h>
 #include <linux/major.h>
 
-/* bsg.h contains the sg v4 user space interface structure (sg_io_v4). */
+/*
+ * bsg.h contains the sg v4 user space interface structure (sg_io_v4).
+ * That structure is also used as the controlling object when multiple
+ * requests are issued with one ioctl() call.
+ */
 #include <linux/bsg.h>
 
 /*
@@ -109,10 +113,15 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::request_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
+#define SGV4_FLAG_COMPLETE_B4  0x100
+#define SGV4_FLAG_SIG_ON_OTHER  0x200
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+#define SGV4_FLAG_STOP_IF 0x800	/* Stops sync mrq if error or warning */
 #define SGV4_FLAG_DEV_SCOPE 0x1000 /* permit SG_IOABORT to have wider scope */
 #define SGV4_FLAG_SHARE 0x2000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
+#define SGV4_FLAG_DO_ON_OTHER 0x4000 /* available on either of shared pair */
 #define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER	/* needed for sharing */
+#define SGV4_FLAG_MULTIPLE_REQS 0x20000	/* n sg_io_v4s in data-in */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1

From patchwork Fri May 24 18:48:08 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960351
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95F7115A6
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:48 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8692B28A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:48 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 7B20E28A26; Fri, 24 May 2019 18:48:48 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6107228A22
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2391676AbfEXSsr (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:47 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56451 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391360AbfEXSsq (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:46 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id D0F4820417E;
        Fri, 24 May 2019 20:48:44 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id LHCcAIcuonUX; Fri, 24 May 2019 20:48:42 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id B2426204172;
        Fri, 24 May 2019 20:48:32 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 18/19] sg: add slave wait capability
Date: Fri, 24 May 2019 14:48:08 -0400
Message-Id: <20190524184809.25121-19-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

In request sharing, the slave side (i.e. the WRITEr) usually needs
to wait for the master side to complete before the slave can
issue its WRITE (or any other data-out command). This small
optimisation allows the slave WRITE to be submitted directly after
the master side submits its command (e.g. a READ).

Of course that slave command can't be executed until the data from
the master command becomes available. However the slave command
(i.e. its cdb) can be fetched and checked and the resources it
needs can be obtained. Also when the master side completes,
its callback can kick off the slave command prior to returning
to the user space with the response. No context switches
(between the user and kernel space) are saved, but the slave
request and the master response context switches are no longer
necessarily holding up IO.

See the webpage at: http://sg.danny.cz/sg/sg_v40.html
in the section titled: "7.1 Slave waiting"

This patch adds 120 lines and an extra state in the sg_request's
state machine. It may cost more than it is worth and so has been
added (almost) last.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 147 ++++++++++++++++++++++++++++++++++++++---
 include/uapi/scsi/sg.h |  24 ++++---
 2 files changed, 154 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 92a0226cabc2..d048c1f371ce 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -12,9 +12,9 @@
  *
  */
 
-static int sg_version_num = 40001;	/* 2 digits for each component */
-#define SG_VERSION_STR "4.0.01"
-static char *sg_version_date = "20190520";
+static int sg_version_num = 40012;	/* 2 digits for each component */
+#define SG_VERSION_STR "4.0.12"
+static char *sg_version_date = "20190521";
 
 #include <linux/module.h>
 
@@ -92,6 +92,7 @@ enum sg_rq_state {
 	SG_RS_BUSY,		/* temporary state should rarely be seen */
 	SG_RS_SHR_SWAP,		/* swap: master finished, awaiting slave */
 	SG_RS_SHR_SLAVE,	/* master waits while slave inflight */
+	SG_RS_SHR_SWAIT,	/* slave waiting for master to complete */
 };
 
 /* slave sets up sharing: ioctl(sl_fd,SG_SET_GET_EXTENDED(SHARE_FD(ma_fd))) */
@@ -132,6 +133,7 @@ enum sg_shr_var {
 #define SG_FRQ_DIO_IN_USE	3	/* false->indirect_IO,mmap; 1->dio */
 #define SG_FRQ_NO_US_XFER	4	/* no user space transfer of data */
 #define SG_FRQ_ABORTING		5	/* in process of aborting this cmd */
+#define SG_FRQ_DEFER_XFR	6	/* slave waiting will defer transfer */
 #define SG_FRQ_DEACT_ORPHAN	7	/* not keeping orphan so de-activate */
 #define SG_FRQ_MULTI_REQ	8	/* part of a multiple request series */
 #define SG_FRQ_BLK_PUT_REQ	9	/* set when blk_put_request() called */
@@ -1386,6 +1388,18 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	unsigned long iflags;
 	struct sg_device *sdp = sfp->parentdp;
 
+	if (unlikely(srp->sh_var == SG_SHR_SL_RQ && sfp->share_sfp)) {
+		if (test_and_clear_bit(SG_FRQ_DEFER_XFR, srp->frq_bm)) {
+			int res = sg_rq_map_kern(srp,
+						 sdp->device->request_queue,
+						 srp->rq);
+
+			SG_LOG(3, sdp, "%s: deferred xfer\n", __func__);
+			if (unlikely(res))
+				pr_warn("%s: sg_rq_map_kern() --> %d\n",
+					__func__, res);
+		}
+	}
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	sync = SG_IS_SYNC_INVOC(srp);
 	SG_LOG(3, sdp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
@@ -1438,6 +1452,56 @@ sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
 	return -EPROTOTYPE;
 }
 
+/*
+ * Here when share slave request is being built and SG_FRQ_DEFER_XFR is set.
+ * Returns 0 for SWAIT (i.e. deferred execution), SZ_SG_IO_V4 (positive) for
+ * execute now, and negated errno value for error.
+ */
+static int
+sg_process_swait(struct sg_fd *sl_sfp, struct sg_request *sl_srp)
+{
+	int res = 0;
+	unsigned long iflags;
+	enum sg_rq_state ma_sr_st;
+	struct sg_fd *ma_sfp = sl_sfp->share_sfp;
+	struct sg_request *ma_rsv_srp;
+
+	if (!ma_sfp || !ma_sfp->rsv_srp)
+		return -EPROTO;
+	ma_rsv_srp = ma_sfp->rsv_srp;
+	spin_lock_irqsave(&ma_rsv_srp->req_lck, iflags);
+	ma_sr_st = atomic_read(&ma_rsv_srp->rq_st);
+	SG_LOG(3, sl_sfp->parentdp, "%s: ma_rsv_srp=0x%p, ma_sr_st: %s\n",
+	       __func__, ma_rsv_srp, sg_rq_st_str(ma_sr_st, false));
+
+	switch (ma_sr_st) {	/* master's state held constant by spinlock */
+	case SG_RS_INFLIGHT:
+		res = sg_rstate_chg(sl_srp, SG_RS_INFLIGHT, SG_RS_SHR_SWAIT);
+		goto fini;	/* if that worked res=0 and slave now SWAIT */
+	case SG_RS_AWAIT_RD:
+	case SG_RS_DONE_RD:
+	case SG_RS_SHR_SWAP:
+		res = (ma_rsv_srp->rq_result & SG_ML_RESULT_MSK) ?
+				-ENOSTR : SZ_SG_IO_V4;
+		break;
+	case SG_RS_BUSY:
+		res = -EBUSY;
+		break;
+	case SG_RS_INACTIVE:
+		res = -EAGAIN;
+		break;
+	case SG_RS_SHR_SLAVE:
+		res = -EDOM;
+		break;
+	default:
+		res = -EPROTO;
+		break;
+	}
+fini:
+	spin_unlock_irqrestore(&ma_rsv_srp->req_lck, iflags);
+	return res;
+}
+
 /*
  * All writes and submits converge on this function to launch the SCSI
  * command/request (via blk_execute_rq_nowait). Returns a pointer to a
@@ -1535,6 +1599,15 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	}
 	srp->rq->timeout = cwrp->timeout;
 
+	if (sh_var == SG_SHR_SL_RQ &&
+	    test_bit(SG_FRQ_DEFER_XFR, srp->frq_bm)) {
+		res = sg_process_swait(fp, srp);
+		if (!res)
+			return srp;
+		else if (unlikely(res < 0))
+			goto err_out;
+		/* fallthrough when res is SZ_SG_IO_V4 to execute now */
+	}
 	sg_execute_cmd(fp, srp);
 	return srp;
 err_out:
@@ -2501,6 +2574,7 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 		is_dur = true;	/* completion has occurred, timing finished */
 		break;
 	case SG_RS_INACTIVE:
+	case SG_RS_SHR_SWAIT:
 	default:
 		break;
 	}
@@ -2731,6 +2805,14 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		if (srp->rq)
 			blk_abort_request(srp->rq);
 		break;
+	case SG_RS_SHR_SWAIT:
+		srp->rq_result |= (DRIVER_SOFT << 24);
+		if (sg_rstate_chg(srp, SG_RS_SHR_SWAIT, SG_RS_AWAIT_RD))
+			pr_warn("%s: unable to set rq_st?\n", __func__);
+		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		break;
 	default:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		break;
@@ -3906,6 +3988,7 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 	struct sg_request *srp = container_of(work, struct sg_request,
 					      ew_orph.work);
 	struct sg_fd *sfp;
+	struct sg_request *slrp;
 
 	if (unlikely(!srp)) {
 		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
@@ -3916,7 +3999,40 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
 		return;
 	}
-	SG_LOG(3, sfp->parentdp, "%s: srp=0x%p\n", __func__, srp);
+	slrp = sfp->slave_srp;
+	SG_LOG(3, sfp->parentdp, "%s: %s srp=0x%p, rq_st: %s, share: %s\n",
+	       __func__, (slrp ? "swait" : "clean"), srp,
+	       sg_rq_st_str(atomic_read(&srp->rq_st), false),
+	       sg_shr_str(srp->sh_var, false));
+	if (slrp) {
+		unsigned long iflags;
+		enum sg_rq_state rqq_st;
+		struct sg_fd *slave_sfp = slrp->parentfp;
+
+		spin_lock_irqsave(&slrp->req_lck, iflags);
+		rqq_st = atomic_read(&slrp->rq_st);
+		if (slave_sfp && rqq_st == SG_RS_SHR_SWAIT) {
+			if (!(srp->rq_result & SG_ML_RESULT_MSK)) {
+				/* master is error/sense free */
+				sg_rstate_chg(slrp, rqq_st, SG_RS_INFLIGHT);
+				spin_unlock_irqrestore(&slrp->req_lck, iflags);
+				sg_execute_cmd(slave_sfp, slrp);
+				goto chk_second;
+			}
+			/* end slave cmd with ::driver_status=DRIVER_SOFT */
+			sg_rstate_chg(slrp, rqq_st, SG_RS_AWAIT_RD);
+			spin_unlock_irqrestore(&slrp->req_lck, iflags);
+			slrp->rq_result |= (DRIVER_SOFT << 24);
+			if (slave_sfp) {
+				wake_up_interruptible(&slave_sfp->read_wait);
+				kill_fasync(&slave_sfp->async_qp, SIGPOLL,
+					    POLL_IN);
+			}
+		} else {
+			spin_unlock_irqrestore(&slrp->req_lck, iflags);
+		}
+	}
+chk_second:
 	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
 		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
 		sg_deact_request(sfp, srp);
@@ -4062,8 +4178,15 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		kref_put(&sfp->f_ref, sg_remove_sfp);
-	} else {        /* clean up orphaned request that aren't being kept */
+		if (sfp->slave_srp &&	/* schedule waiting slave rq */
+		    atomic_read(&sfp->slave_srp->rq_st) == SG_RS_SHR_SWAIT) {
+			INIT_WORK(&srp->ew_orph.work,
+				  sg_rq_end_io_usercontext);
+			schedule_work(&srp->ew_orph.work);
+		} else {
+			kref_put(&sfp->f_ref, sg_remove_sfp);
+		}
+	} else {	/* clean up orphaned request that aren't being kept */
 		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
 		schedule_work(&srp->ew_orph.work);
 	}
@@ -4573,7 +4696,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
 			SG_LOG(1, sdp, "%s: blk_rq_map_user() res=%d\n",
 			       __func__, res);
-	} else {
+	} else if (!test_bit(SG_FRQ_DEFER_XFR, srp->frq_bm)) {
 		/* transfer data to/from kernel buffers */
 		res = sg_rq_map_kern(srp, q, rq);
 	}
@@ -4840,6 +4963,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			case SG_RS_SHR_SLAVE:
 				goto good;
 			case SG_RS_INFLIGHT:
+			case SG_RS_SHR_SWAIT:	/* awaiting master's finish */
 				break;
 			default:
 				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
@@ -5040,17 +5164,18 @@ sg_add_request(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	bool allow_rsv = true;
 	bool mk_new_srp = true;
 	bool sl_req = false;
+	bool sl_swait = false;
 	u32 sum_dlen;
 	unsigned long iflags;
 	enum sg_rq_state sr_st;
-	enum sg_rq_state ma_sr_st;
+	enum sg_rq_state uninitialized_var(ma_sr_st);
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *rsv_srp;	/* current fd's reserve request */
 	struct sg_request *uninitialized_var(ma_rsv_srp);
 	struct sg_fd *uninitialized_var(ma_sfp);
 	__maybe_unused struct sg_device *sdp;
-	__maybe_unused const char *cp;
+	__maybe_unused const char *uninitialized_var(cp);
 	char b[48];
 
 	spin_lock_irqsave(&fp->rq_list_lock, iflags);
@@ -5110,6 +5235,7 @@ sg_add_request(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			break;
 		case  SG_RS_INFLIGHT:
 			sl_req = true;
+			sl_swait = true;
 			break;
 		case SG_RS_INACTIVE:
 			r_srp = ERR_PTR(-EADDRNOTAVAIL);
@@ -5231,6 +5357,7 @@ sg_add_request(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		ma_sfp->slave_srp = r_srp;
 		/* slave "shares" the master reserve request's data buffer */
 		r_srp->sgatp = &ma_rsv_srp->sgat_h;
+		assign_bit(SG_FRQ_DEFER_XFR, r_srp->frq_bm, sl_swait);
 	}
 	if (mk_new_srp)
 		spin_lock_irqsave(&fp->rq_list_lock, iflags);
@@ -5530,6 +5657,8 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 		return long_str ? "share swap" : "s_wp";
 	case SG_RS_SHR_SLAVE:	/* only an active master has this */
 		return long_str ? "share slave active" : "sl_a";
+	case SG_RS_SHR_SWAIT:	/* only an active slave has this */
+		return long_str ? "share slave wait" : "sl_w";
 	default:
 		return long_str ? "unknown" : "unk";
 	}
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index b299e7d1b51d..348918f9e0b0 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -14,7 +14,7 @@
  * Later extensions (versions 2, 3 and 4) to driver:
  *   Copyright (C) 1998 - 2018 Douglas Gilbert
  *
- * Version 4.0.11 (20190502)
+ * Version 4.0.12 (20190521)
  *  This version is for Linux 4 and 5 series kernels.
  *
  * Documentation
@@ -115,7 +115,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_COMPLETE_B4  0x100
 #define SGV4_FLAG_SIG_ON_OTHER  0x200
-#define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+#define SGV4_FLAG_IMMED 0x400	/* for polling with SG_IOR, ignored in SG_IOS */
 #define SGV4_FLAG_STOP_IF 0x800	/* Stops sync mrq if error or warning */
 #define SGV4_FLAG_DEV_SCOPE 0x1000 /* permit SG_IOABORT to have wider scope */
 #define SGV4_FLAG_SHARE 0x2000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
@@ -149,7 +149,7 @@ typedef struct sg_scsi_id {
 	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
 	short d_queue_depth;/* device (or adapter) maximum queue length */
 	union {
-		int unused[2];  /* as per version 3 driver */
+		int unused[2];	/* as per version 3 driver */
 		__u8 scsi_lun[8];  /* full 8 byte SCSI LUN [in v4 driver] */
 	};
 } sg_scsi_id_t;
@@ -161,8 +161,14 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	/* sg_io_owned set imples synchronous, clear implies asynchronous */
 	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
 	char problem;	/* 0 -> no problem detected, 1 -> error to report */
+	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
 	int pack_id;	/* pack_id, in v4 driver may be tag instead */
 	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
+	/*
+	 * millisecs elapsed since the command started (req_state==1) or
+	 * command duration (req_state==2). Will be in nanoseconds after
+	 * the SG_SET_GET_EXTENDED{TIME_IN_NS} ioctl.
+	 */
 	unsigned int duration;
 	int unused;
 } sg_req_info_t;
@@ -198,9 +204,9 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
 /* rd> 1: master finished 0: not; wr> 1: finish share post master */
 #define SG_CTL_FLAGM_MASTER_FINI 0x100	/* wr> 0: setup for repeat slave req */
-#define SG_CTL_FLAGM_MASTER_ERR 0x200	/* rd: sharing, master got error */
+#define SG_CTL_FLAGM_MASTER_ERR	0x200	/* rd: sharing, master got error */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
-#define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
+#define SG_CTL_FLAGM_MORE_ASYNC 0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */
@@ -433,9 +439,11 @@ struct sg_header {
 /*
  * New ioctls to replace async (non-blocking) write()/read() interface.
  * Present in version 4 and later of the sg driver [>20190427]. The
- * SG_IOSUBMIT and SG_IORECEIVE ioctls accept the sg_v4 interface based on
- * struct sg_io_v4 found in <include/uapi/linux/bsg.h>. These objects are
- * passed by a pointer in the third argument of the ioctl.
+ * SG_IOSUBMIT_V3 and SG_IORECEIVE_V3 ioctls accept the sg_v3 interface
+ * based on struct sg_io_hdr shown above. The SG_IOSUBMIT and SG_IORECEIVE
+ * ioctls accept the sg_v4 interface based on struct sg_io_v4 found in
+ * <include/uapi/linux/bsg.h>. These objects are passed by a pointer in
+ * the third argument of the ioctl.
  *
  * Data may be transferred both from the user space to the driver by these
  * ioctls. Hence the _IOWR macro is used here to generate the ioctl number

From patchwork Fri May 24 18:48:09 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Douglas Gilbert <dgilbert@interlog.com>
X-Patchwork-Id: 10960349
Return-Path: <linux-scsi-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3414C1390
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:48 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27B3128A24
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:48 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 1C94428A26; Fri, 24 May 2019 18:48:48 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A33D628A24
	for <patchwork-linux-scsi@patchwork.kernel.org>;
 Fri, 24 May 2019 18:48:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2403773AbfEXSsr (ORCPT
        <rfc822;patchwork-linux-scsi@patchwork.kernel.org>);
        Fri, 24 May 2019 14:48:47 -0400
Received: from smtp.infotech.no ([82.134.31.41]:56446 "EHLO smtp.infotech.no"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S2391167AbfEXSsr (ORCPT <rfc822;linux-scsi@vger.kernel.org>);
        Fri, 24 May 2019 14:48:47 -0400
Received: from localhost (localhost [127.0.0.1])
        by smtp.infotech.no (Postfix) with ESMTP id B535A204170;
        Fri, 24 May 2019 20:48:44 +0200 (CEST)
X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no
Received: from smtp.infotech.no ([127.0.0.1])
        by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024)
        with ESMTP id 33AEue6sCTm8; Fri, 24 May 2019 20:48:43 +0200 (CEST)
Received: from xtwo70.bingwo.ca (host-45-58-224-183.dyn.295.ca
 [45.58.224.183])
        by smtp.infotech.no (Postfix) with ESMTPA id DCD73204197;
        Fri, 24 May 2019 20:48:33 +0200 (CEST)
From: Douglas Gilbert <dgilbert@interlog.com>
To: linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, jejb@linux.vnet.ibm.com, hare@suse.de,
        bart.vanassche@wdc.com
Subject: [PATCH 19/19] sg: table of error numbers with meanings
Date: Fri, 24 May 2019 14:48:09 -0400
Message-Id: <20190524184809.25121-20-dgilbert@interlog.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20190524184809.25121-1-dgilbert@interlog.com>
References: <20190524184809.25121-1-dgilbert@interlog.com>
Sender: linux-scsi-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Rather than having a piece of paper recording which errno
values have been used for what, the author thought why not
place then in one table in the driver code.

As a guesstimate, over half the code in this driver is dedicated
to sanity checking and reporting errors. Those errors may come
from the host machine, the SCSI HBA or its associated hardware,
or the transport or the storage device. For near end errors
some creative license is taken with errno values (e.g.
ENOTSOCK) to convey a better sense of what this driver is
objecting to.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d048c1f371ce..4fee90cca852 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -389,6 +389,48 @@ static void sg_rep_rq_state_fail(struct sg_device *sdp,
 #define SG_LOG(depth, sdp, fmt, a...)
 #endif	/* end of CONFIG_SCSI_LOGGING && SG_DEBUG conditional */
 
+/*
+ * Unless otherwise noted, functions that return int will return 0 for
+ * good/successful or a negated errno value. Here is list of errno_s generated
+ * by this driver:
+ *
+ * E2BIG    sum(dlen) > tot_fd_thresh ; slave dxfer_len > master dxfer_len
+ * EACCES   user (process) does not have sufficient privilege or capabilities
+ * EADDRINUSE	sharing: slave file descriptor already in share
+ * EADDRNOTAVAIL   sharing: master file descriptor already in share
+ *		   slave request but no preceding master request
+ * EAGAIN   [aka EWOULDBLOCK]; occurs when O_NONBLOCK set on open() or
+ *	    SGV4_FLAG_IMMED given, and SG_IORECEIVE (or read(2)) not ready
+ * EBUSY    'Device or resource busy'; this uses open(O_NONBLOCK) but another
+ *	    has open(O_EXCL); reserve request in use (e.g. when mmap() called)
+ * EDOM     numerical error, command queueing false and second command
+ *	    attempted when one already outstanding
+ * EFAULT   problem moving data to or from user space
+ * EIDRM    block request unexpectedly missing
+ * EINTR    interrupted system call (generated by kernel, not this driver)
+ * EINVAL   flags or other input information contradicts or disallowed
+ * EIO      only kept for backward compatibility, too generic to be useful
+ * ELOOP    sharing: file descriptor can't share with itself
+ * EMSGSIZE    cdb too long (> 252 bytes) or too short (less than 6 bytes)
+ * ENODATA  sharing: no data xfer requested; mmap or direct io problem
+ *          SG_IOABORT: no match on pack_id or tag; mrq: no active reqs
+ * ENODEV   target (SCSI) device associated with the fd has "disappeared"
+ * ENOMEM   obvious; could be some pre-allocated cache that is full
+ * ENOMSG   data transfer setup needed or (direction) disallowed (sharing)
+ * ENOSTR   slave request abandoned due to master error or state
+ * ENOTSOCK   sharing: file descriptor for sharing unassociated with sg driver
+ * ENXIO    'no such device or address' SCSI mid-level processing errors
+ *          (e.g. command timeouts); also sg info not in 'file' struct
+ * EPERM    not permitted (even if has ACCES); v1+2,v3,v4 interface usage
+ *	    violation, opened read-only but SCSI command not listed read-only
+ * EPROTO   logic error (in driver); like "shouldn't get here"
+ * EPROTOTYPE    atomic state change failed unexpectedly
+ * ERANGE   multiple requests: usually bad flag values
+ * ERESTARTSYS   should not be seen in user space, associated with an
+ *		 interruptable wait; will restart system call or give EINTR
+ * EWOULDBLOCK   [aka EAGAIN]; additionally if the 'more async' flag is set
+ *		 SG_IOSUBMIT may yield this error
+ */
 
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of