diff mbox

[v3,66/77] ncr5380: Fix soft lockups

Message ID 20151222011755.965510468@telegraphics.com.au (mailing list archive)
State New, archived
Headers show

Commit Message

Finn Thain Dec. 22, 2015, 1:18 a.m. UTC
Because of the rudimentary design of the chip, it is necessary to poll the
SCSI bus signals during PIO and this tends to hog the CPU. The driver will
accept new commands while others execute, and this causes a soft lockup
because the workqueue item will not terminate until the issue queue is
emptied.

When exercising dmx3191d using sequential IO from dd, the driver is sent
512 KiB WRITE commands and 128 KiB READs. For a PIO transfer, the rate is
is only about 300 KiB/s, so these are long-running commands. And although
PDMA may run at several MiB/s, interrupts are disabled for the duration
of the transfer.

Fix the unresponsiveness and soft lockup issues by calling cond_resched()
after each command is completed and by limiting max_sectors for drivers
that don't implement real DMA.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

---

Changed since v2:
- Moved max_sectors initialization to wrapper drivers. It isn't really
  relevant to the core driver and compile-time configuration using macros
  like REAL_DMA should be avoided.

---
 drivers/scsi/NCR5380.c       |    6 ++++--
 drivers/scsi/arm/cumana_1.c  |    1 +
 drivers/scsi/arm/oak.c       |    1 +
 drivers/scsi/atari_NCR5380.c |    6 ++++--
 drivers/scsi/dmx3191d.c      |    1 +
 drivers/scsi/dtc.c           |    1 +
 drivers/scsi/g_NCR5380.c     |    1 +
 drivers/scsi/mac_scsi.c      |    1 +
 drivers/scsi/pas16.c         |    1 +
 drivers/scsi/t128.c          |    1 +
 10 files changed, 16 insertions(+), 4 deletions(-)

Comments

Hannes Reinecke Dec. 22, 2015, 8:03 a.m. UTC | #1
On 12/22/2015 02:18 AM, Finn Thain wrote:
> Because of the rudimentary design of the chip, it is necessary to poll the
> SCSI bus signals during PIO and this tends to hog the CPU. The driver will
> accept new commands while others execute, and this causes a soft lockup
> because the workqueue item will not terminate until the issue queue is
> emptied.
>
> When exercising dmx3191d using sequential IO from dd, the driver is sent
> 512 KiB WRITE commands and 128 KiB READs. For a PIO transfer, the rate is
> is only about 300 KiB/s, so these are long-running commands. And although
> PDMA may run at several MiB/s, interrupts are disabled for the duration
> of the transfer.
>
> Fix the unresponsiveness and soft lockup issues by calling cond_resched()
> after each command is completed and by limiting max_sectors for drivers
> that don't implement real DMA.
>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
>
> ---
>
> Changed since v2:
> - Moved max_sectors initialization to wrapper drivers. It isn't really
>    relevant to the core driver and compile-time configuration using macros
>    like REAL_DMA should be avoided.
>
> ---
>   drivers/scsi/NCR5380.c       |    6 ++++--
>   drivers/scsi/arm/cumana_1.c  |    1 +
>   drivers/scsi/arm/oak.c       |    1 +
>   drivers/scsi/atari_NCR5380.c |    6 ++++--
>   drivers/scsi/dmx3191d.c      |    1 +
>   drivers/scsi/dtc.c           |    1 +
>   drivers/scsi/g_NCR5380.c     |    1 +
>   drivers/scsi/mac_scsi.c      |    1 +
>   drivers/scsi/pas16.c         |    1 +
>   drivers/scsi/t128.c          |    1 +
>   10 files changed, 16 insertions(+), 4 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Alan Cox Dec. 22, 2015, 11:39 a.m. UTC | #2
On Tue, 22 Dec 2015 12:18:44 +1100
Finn Thain <fthain@telegraphics.com.au> wrote:

> Because of the rudimentary design of the chip, it is necessary to poll the
> SCSI bus signals during PIO and this tends to hog the CPU. The driver will
> accept new commands while others execute, and this causes a soft lockup
> because the workqueue item will not terminate until the issue queue is
> emptied.
> 
> When exercising dmx3191d using sequential IO from dd, the driver is sent
> 512 KiB WRITE commands and 128 KiB READs. For a PIO transfer, the rate is
> is only about 300 KiB/s, so these are long-running commands. And although
> PDMA may run at several MiB/s, interrupts are disabled for the duration
> of the transfer.
> 
> Fix the unresponsiveness and soft lockup issues by calling cond_resched()
> after each command is completed and by limiting max_sectors for drivers
> that don't implement real DMA.

Is there a reason for not doing some limiting in the DMA case too. A 512K
write command even with DMA on a low end 68K box introduces a second of
latency before another I/O can be scheduled ?

Alan
Finn Thain Dec. 22, 2015, 1:47 p.m. UTC | #3
On Tue, 22 Dec 2015, One Thousand Gnomes wrote:

> On Tue, 22 Dec 2015 12:18:44 +1100 Finn Thain 
> <fthain@telegraphics.com.au> wrote:
> 
> > Because of the rudimentary design of the chip, it is necessary to poll 
> > the SCSI bus signals during PIO and this tends to hog the CPU. The 
> > driver will accept new commands while others execute, and this causes 
> > a soft lockup because the workqueue item will not terminate until the 
> > issue queue is emptied.
> > 
> > When exercising dmx3191d using sequential IO from dd, the driver is 
> > sent 512 KiB WRITE commands and 128 KiB READs. For a PIO transfer, the 
> > rate is is only about 300 KiB/s, so these are long-running commands. 
> > And although PDMA may run at several MiB/s, interrupts are disabled 
> > for the duration of the transfer.
> > 
> > Fix the unresponsiveness and soft lockup issues by calling 
> > cond_resched() after each command is completed and by limiting 
> > max_sectors for drivers that don't implement real DMA.
> 
> Is there a reason for not doing some limiting in the DMA case too. A 
> 512K write command even with DMA on a low end 68K box introduces a 
> second of latency before another I/O can be scheduled ?

The DMA case is the atari_scsi case. I'd like to think that atari_scsi 
would have only the latency issues that might be expected from any SCSI-2 
host adapter driver.

Unlike PDMA, interrupts are not disabled for these DMA transfers. Note 
that this patch isn't really relevant to DMA, because the main loop 
iterates only when done == 0, that is, !hostdata->dmalen.
Michael Schmitz Dec. 23, 2015, 12:42 a.m. UTC | #4
I'd like to think that, too - probably true for the Atari TT SCSI case
(can do scatter-gather, can do more than one command per LUN). Worse
for the Falcon SCSI which is the only one I can test (no
scatter-gather, one command per LUN, interrupt shared with IDE and IDE
driver locked out while SCSI command handled).

But that only affects balancing of I/O between IDE and SCSI drivers.
Is that what you are worried about, Alan?

Happy to test whether limiting max_sectors makes a difference in the DMA case.

Cheers,

  Michael



On Wed, Dec 23, 2015 at 2:47 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
>
> On Tue, 22 Dec 2015, One Thousand Gnomes wrote:
>
>> On Tue, 22 Dec 2015 12:18:44 +1100 Finn Thain
>> <fthain@telegraphics.com.au> wrote:
>>
>> > Because of the rudimentary design of the chip, it is necessary to poll
>> > the SCSI bus signals during PIO and this tends to hog the CPU. The
>> > driver will accept new commands while others execute, and this causes
>> > a soft lockup because the workqueue item will not terminate until the
>> > issue queue is emptied.
>> >
>> > When exercising dmx3191d using sequential IO from dd, the driver is
>> > sent 512 KiB WRITE commands and 128 KiB READs. For a PIO transfer, the
>> > rate is is only about 300 KiB/s, so these are long-running commands.
>> > And although PDMA may run at several MiB/s, interrupts are disabled
>> > for the duration of the transfer.
>> >
>> > Fix the unresponsiveness and soft lockup issues by calling
>> > cond_resched() after each command is completed and by limiting
>> > max_sectors for drivers that don't implement real DMA.
>>
>> Is there a reason for not doing some limiting in the DMA case too. A
>> 512K write command even with DMA on a low end 68K box introduces a
>> second of latency before another I/O can be scheduled ?
>
> The DMA case is the atari_scsi case. I'd like to think that atari_scsi
> would have only the latency issues that might be expected from any SCSI-2
> host adapter driver.
>
> Unlike PDMA, interrupts are not disabled for these DMA transfers. Note
> that this patch isn't really relevant to DMA, because the main loop
> iterates only when done == 0, that is, !hostdata->dmalen.
>
> --
>
>>
>> Alan
>
diff mbox

Patch

Index: linux/drivers/scsi/NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/NCR5380.c	2015-12-22 12:17:12.000000000 +1100
+++ linux/drivers/scsi/NCR5380.c	2015-12-22 12:17:15.000000000 +1100
@@ -890,10 +890,10 @@  static void NCR5380_main(struct work_str
 	struct scsi_cmnd *cmd;
 	int done;
 	
-	spin_lock_irq(&hostdata->lock);
 	do {
 		done = 1;
 
+		spin_lock_irq(&hostdata->lock);
 		while (!hostdata->connected &&
 		       (cmd = dequeue_next_cmd(instance))) {
 
@@ -930,8 +930,10 @@  static void NCR5380_main(struct work_str
 			NCR5380_information_transfer(instance);
 			done = 0;
 		}
+		spin_unlock_irq(&hostdata->lock);
+		if (!done)
+			cond_resched();
 	} while (!done);
-	spin_unlock_irq(&hostdata->lock);
 }
 
 #ifndef DONT_USE_INTR
Index: linux/drivers/scsi/atari_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/atari_NCR5380.c	2015-12-22 12:17:14.000000000 +1100
+++ linux/drivers/scsi/atari_NCR5380.c	2015-12-22 12:17:15.000000000 +1100
@@ -976,10 +976,10 @@  static void NCR5380_main(struct work_str
 	 * alter queues and touch the Falcon lock.
 	 */
 
-	spin_lock_irq(&hostdata->lock);
 	do {
 		done = 1;
 
+		spin_lock_irq(&hostdata->lock);
 		while (!hostdata->connected &&
 		       (cmd = dequeue_next_cmd(instance))) {
 
@@ -1026,8 +1026,10 @@  static void NCR5380_main(struct work_str
 			NCR5380_information_transfer(instance);
 			done = 0;
 		}
+		spin_unlock_irq(&hostdata->lock);
+		if (!done)
+			cond_resched();
 	} while (!done);
-	spin_unlock_irq(&hostdata->lock);
 }
 
 
Index: linux/drivers/scsi/arm/cumana_1.c
===================================================================
--- linux.orig/drivers/scsi/arm/cumana_1.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/arm/cumana_1.c	2015-12-22 12:17:15.000000000 +1100
@@ -209,6 +209,7 @@  static struct scsi_host_template cumanas
 	.use_clustering		= DISABLE_CLUSTERING,
 	.proc_name		= "CumanaSCSI-1",
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 
 static int cumanascsi1_probe(struct expansion_card *ec,
Index: linux/drivers/scsi/arm/oak.c
===================================================================
--- linux.orig/drivers/scsi/arm/oak.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/arm/oak.c	2015-12-22 12:17:15.000000000 +1100
@@ -115,6 +115,7 @@  static struct scsi_host_template oakscsi
 	.use_clustering		= DISABLE_CLUSTERING,
 	.proc_name		= "oakscsi",
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 
 static int oakscsi_probe(struct expansion_card *ec, const struct ecard_id *id)
Index: linux/drivers/scsi/dmx3191d.c
===================================================================
--- linux.orig/drivers/scsi/dmx3191d.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/dmx3191d.c	2015-12-22 12:17:15.000000000 +1100
@@ -62,6 +62,7 @@  static struct scsi_host_template dmx3191
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 
 static int dmx3191d_probe_one(struct pci_dev *pdev,
Index: linux/drivers/scsi/dtc.c
===================================================================
--- linux.orig/drivers/scsi/dtc.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/dtc.c	2015-12-22 12:17:15.000000000 +1100
@@ -453,5 +453,6 @@  static struct scsi_host_template driver_
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 #include "scsi_module.c"
Index: linux/drivers/scsi/g_NCR5380.c
===================================================================
--- linux.orig/drivers/scsi/g_NCR5380.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/g_NCR5380.c	2015-12-22 12:17:15.000000000 +1100
@@ -729,6 +729,7 @@  static struct scsi_host_template driver_
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 
 #include "scsi_module.c"
Index: linux/drivers/scsi/mac_scsi.c
===================================================================
--- linux.orig/drivers/scsi/mac_scsi.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/mac_scsi.c	2015-12-22 12:17:15.000000000 +1100
@@ -324,6 +324,7 @@  static struct scsi_host_template mac_scs
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 
 static int __init mac_scsi_probe(struct platform_device *pdev)
Index: linux/drivers/scsi/pas16.c
===================================================================
--- linux.orig/drivers/scsi/pas16.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/pas16.c	2015-12-22 12:17:15.000000000 +1100
@@ -563,6 +563,7 @@  static struct scsi_host_template driver_
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 #include "scsi_module.c"
 
Index: linux/drivers/scsi/t128.c
===================================================================
--- linux.orig/drivers/scsi/t128.c	2015-12-22 12:16:58.000000000 +1100
+++ linux/drivers/scsi/t128.c	2015-12-22 12:17:15.000000000 +1100
@@ -407,5 +407,6 @@  static struct scsi_host_template driver_
 	.cmd_per_lun		= 2,
 	.use_clustering		= DISABLE_CLUSTERING,
 	.cmd_size		= NCR5380_CMD_SIZE,
+	.max_sectors		= 128,
 };
 #include "scsi_module.c"