Message ID | 1543442171-24863-2-git-send-email-linux@roeck-us.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] esp-pci: Fix status register write erase control | expand |
On 28/11/18 22:56, Guenter Roeck wrote: > The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers > to reflect a consistent state. However, it is possible that the registers > can change after RSTAT was read, but before RINTR is read. > > Guest OS qemu > -------- ---- > Read RSTAT > esp_command_complete() > RSTAT = STAT_ST > esp_dma_done() > RSTAT |= STAT_TC > RSEQ = 0 > RINTR = INTR_BS > > Read RSEQ > Read RINTR RINTR = 0 > RSTAT &= ~STAT_TC > RSEQ = SEQ_CD > > The guest OS would then try to handle INTR_BS combined with an old > value of RSTAT. This sometimes resulted in lost events, spurious > interrupts, guest OS confusion, and stalled SCSI operations. The question is, why was the guest running the interrupt routine before STAT_INT was set in RSTAT? The code in esp_raise_irq seems good: if (!(s->rregs[ESP_RSTAT] & STAT_INT)) { s->rregs[ESP_RSTAT] |= STAT_INT; qemu_irq_raise(s->irq); trace_esp_raise_irq(); } Paolo > A typical guest error log (observed with various versions of Linux) > looks as follows. > > scsi host1: Spurious irq, sreg=13. > ... > scsi host1: Aborting command [84531f10:2a] > scsi host1: Current command [f882eea8:35] > scsi host1: Queued command [84531f10:2a] > scsi host1: Active command [f882eea8:35] > scsi host1: Dumping command log > scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c] > scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] > scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] > scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] > ...
On 29/11/2018 09:58, Paolo Bonzini wrote: > On 28/11/18 22:56, Guenter Roeck wrote: >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers >> to reflect a consistent state. However, it is possible that the registers >> can change after RSTAT was read, but before RINTR is read. >> >> Guest OS qemu >> -------- ---- >> Read RSTAT >> esp_command_complete() >> RSTAT = STAT_ST >> esp_dma_done() >> RSTAT |= STAT_TC >> RSEQ = 0 >> RINTR = INTR_BS >> >> Read RSEQ >> Read RINTR RINTR = 0 >> RSTAT &= ~STAT_TC >> RSEQ = SEQ_CD >> >> The guest OS would then try to handle INTR_BS combined with an old >> value of RSTAT. This sometimes resulted in lost events, spurious >> interrupts, guest OS confusion, and stalled SCSI operations. > > The question is, why was the guest running the interrupt routine before > STAT_INT was set in RSTAT? The code in esp_raise_irq seems good: > > if (!(s->rregs[ESP_RSTAT] & STAT_INT)) { > s->rregs[ESP_RSTAT] |= STAT_INT; > qemu_irq_raise(s->irq); > trace_esp_raise_irq(); > } > > Paolo This patch is very interesting, as I have a long-running regression trying to boot NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that turned on iothread by default in QEMU. The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1 qemu-system-sparc ..." then it works and you can complete the installation. So certainly this suggests that there is a race condition still present in ESP somewhere. I've given this patch a spin, and in a few quick tests here I was able to consistently get further in kernel boot, but it still doesn't completely solve issue for me :/ ATB, Mark.
On 11/29/18 1:58 AM, Paolo Bonzini wrote: > On 28/11/18 22:56, Guenter Roeck wrote: >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers >> to reflect a consistent state. However, it is possible that the registers >> can change after RSTAT was read, but before RINTR is read. >> >> Guest OS qemu >> -------- ---- >> Read RSTAT >> esp_command_complete() >> RSTAT = STAT_ST >> esp_dma_done() >> RSTAT |= STAT_TC >> RSEQ = 0 >> RINTR = INTR_BS >> >> Read RSEQ >> Read RINTR RINTR = 0 >> RSTAT &= ~STAT_TC >> RSEQ = SEQ_CD >> >> The guest OS would then try to handle INTR_BS combined with an old >> value of RSTAT. This sometimes resulted in lost events, spurious >> interrupts, guest OS confusion, and stalled SCSI operations. > > The question is, why was the guest running the interrupt routine before > STAT_INT was set in RSTAT? The code in esp_raise_irq seems good: > It doesn't. It polls for interrupts, but only enters the interrupt handler if one was observed. It uses the DMA status (see first patch) for the polling, presumably to not disturb RSTAT. Qemu produces two interrupts in a row, the second while the first one is still handled. Also, it works fine with a real controller (yes, I did buy one, and an old SCSI drive, to test). Guenter > if (!(s->rregs[ESP_RSTAT] & STAT_INT)) { > s->rregs[ESP_RSTAT] |= STAT_INT; > qemu_irq_raise(s->irq); > trace_esp_raise_irq(); > } > > Paolo > >> A typical guest error log (observed with various versions of Linux) >> looks as follows. >> >> scsi host1: Spurious irq, sreg=13. >> ... >> scsi host1: Aborting command [84531f10:2a] >> scsi host1: Current command [f882eea8:35] >> scsi host1: Queued command [84531f10:2a] >> scsi host1: Active command [f882eea8:35] >> scsi host1: Dumping command log >> scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c] >> scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] >> scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] >> scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] >> ... > >
On Thu, Nov 29, 2018 at 11:56:39AM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 09:58, Paolo Bonzini wrote: > > > On 28/11/18 22:56, Guenter Roeck wrote: > >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers > >> to reflect a consistent state. However, it is possible that the registers > >> can change after RSTAT was read, but before RINTR is read. > >> > >> Guest OS qemu > >> -------- ---- > >> Read RSTAT > >> esp_command_complete() > >> RSTAT = STAT_ST > >> esp_dma_done() > >> RSTAT |= STAT_TC > >> RSEQ = 0 > >> RINTR = INTR_BS > >> > >> Read RSEQ > >> Read RINTR RINTR = 0 > >> RSTAT &= ~STAT_TC > >> RSEQ = SEQ_CD > >> > >> The guest OS would then try to handle INTR_BS combined with an old > >> value of RSTAT. This sometimes resulted in lost events, spurious > >> interrupts, guest OS confusion, and stalled SCSI operations. > > > > The question is, why was the guest running the interrupt routine before > > STAT_INT was set in RSTAT? The code in esp_raise_irq seems good: > > > > if (!(s->rregs[ESP_RSTAT] & STAT_INT)) { > > s->rregs[ESP_RSTAT] |= STAT_INT; > > qemu_irq_raise(s->irq); > > trace_esp_raise_irq(); > > } > > > > Paolo > > This patch is very interesting, as I have a long-running regression trying to boot > NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that > turned on iothread by default in QEMU. > Makes me wonder: Is there a means to hold up processing esp_command_complete() until a pending interrupt has been handled ? That might be a cleaner solution. I could try to implement that in the driver, but maybe there is a means to do that using the infrastructure. Thanks, Guenter > The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get > to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1 > qemu-system-sparc ..." then it works and you can complete the installation. > > So certainly this suggests that there is a race condition still present in ESP > somewhere. I've given this patch a spin, and in a few quick tests here I was able to > consistently get further in kernel boot, but it still doesn't completely solve issue > for me :/ > > > ATB, > > Mark.
Hi Mark, On Thu, Nov 29, 2018 at 11:56:39AM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 09:58, Paolo Bonzini wrote: > > > On 28/11/18 22:56, Guenter Roeck wrote: > >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers > >> to reflect a consistent state. However, it is possible that the registers > >> can change after RSTAT was read, but before RINTR is read. > >> > >> Guest OS qemu > >> -------- ---- > >> Read RSTAT > >> esp_command_complete() > >> RSTAT = STAT_ST > >> esp_dma_done() > >> RSTAT |= STAT_TC > >> RSEQ = 0 > >> RINTR = INTR_BS > >> > >> Read RSEQ > >> Read RINTR RINTR = 0 > >> RSTAT &= ~STAT_TC > >> RSEQ = SEQ_CD > >> > >> The guest OS would then try to handle INTR_BS combined with an old > >> value of RSTAT. This sometimes resulted in lost events, spurious > >> interrupts, guest OS confusion, and stalled SCSI operations. > > > > The question is, why was the guest running the interrupt routine before > > STAT_INT was set in RSTAT? The code in esp_raise_irq seems good: > > > > if (!(s->rregs[ESP_RSTAT] & STAT_INT)) { > > s->rregs[ESP_RSTAT] |= STAT_INT; > > qemu_irq_raise(s->irq); > > trace_esp_raise_irq(); > > } > > > > Paolo > > This patch is very interesting, as I have a long-running regression trying to boot > NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that > turned on iothread by default in QEMU. > > The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get > to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1 > qemu-system-sparc ..." then it works and you can complete the installation. > > So certainly this suggests that there is a race condition still present in ESP > somewhere. I've given this patch a spin, and in a few quick tests here I was able to > consistently get further in kernel boot, but it still doesn't completely solve issue > for me :/ > Can you try the attached patch ? It is a bit cleaner than the first version, and works for me as well. Note that this isn't perfect. Specifically, I see differences in handling STAT_TC. The controller specification is a bit ambiguous in that regard, but comparing the qemu code with real controller behavior shows that the real controller does not reset STAT_TC when reading the interrupt status register. That doesn't seem to matter for Linux, but it may influence other guests. Guenter From 751edd383f3c70d8a9195345b45aa3eb1ada7553 Mon Sep 17 00:00:00 2001 From: Guenter Roeck <linux@roeck-us.net> Date: Thu, 29 Nov 2018 09:17:42 -0800 Subject: [PATCH] scsi: esp: Defer command completion until previous interrupts have been handled The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers to reflect a consistent state. However, it is possible that the registers can change after RSTAT was read, but before RINTR is read, when esp_command_complete() is called. Guest OS qemu -------- ---- [handle interrupt] Read RSTAT esp_command_complete() RSTAT = STAT_ST esp_dma_done() RSTAT |= STAT_TC RSEQ = 0 RINTR = INTR_BS Read RSEQ Read RINTR RINTR = 0 RSTAT &= ~STAT_TC RSEQ = SEQ_CD The guest OS would then try to handle INTR_BS combined with an old value of RSTAT. This sometimes resulted in lost events, spurious interrupts, guest OS confusion, and stalled SCSI operations. A typical guest error log (observed with various versions of Linux) looks as follows. scsi host1: Spurious irq, sreg=13. ... scsi host1: Aborting command [84531f10:2a] scsi host1: Current command [f882eea8:35] scsi host1: Queued command [84531f10:2a] scsi host1: Active command [f882eea8:35] scsi host1: Dumping command log scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c] scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] ... Defer handling command completion until previous interrupts have been handled to fix the problem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- This looks much cleaner than the previous patch and at the same time identifies the problem. Enabling the new trace shows that the condition is seen on a regular basis (on average 2-3 times per boot test with qemu-system-hppa if the test does nothing but shut down immediately). hw/scsi/esp.c | 29 ++++++++++++++++++++++++----- hw/scsi/trace-events | 1 + include/hw/scsi/esp.h | 2 ++ 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c index 630d923..e3d6cc5 100644 --- a/hw/scsi/esp.c +++ b/hw/scsi/esp.c @@ -286,11 +286,8 @@ static void esp_do_dma(ESPState *s) esp_dma_done(s); } -void esp_command_complete(SCSIRequest *req, uint32_t status, - size_t resid) +static void _esp_command_complete(ESPState *s, uint32_t status) { - ESPState *s = req->hba_private; - trace_esp_command_complete(); if (s->ti_size != 0) { trace_esp_command_complete_unexpected(); @@ -311,6 +308,23 @@ void esp_command_complete(SCSIRequest *req, uint32_t status, } } +void esp_command_complete(SCSIRequest *req, uint32_t status, + size_t resid) +{ + ESPState *s = req->hba_private; + + if (s->rregs[ESP_RSTAT] & STAT_INT) { + /* Defer handling command complete until the previous + * interrupt has been handled. + */ + trace_esp_command_complete_deferred(); + s->deferred_status = status; + s->deferred_complete = true; + return; + } + _esp_command_complete(s, status); +} + void esp_transfer_data(SCSIRequest *req, uint32_t len) { ESPState *s = req->hba_private; @@ -422,7 +436,10 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr) s->rregs[ESP_RSTAT] &= ~STAT_TC; s->rregs[ESP_RSEQ] = SEQ_CD; esp_lower_irq(s); - + if (s->deferred_complete) { + _esp_command_complete(s, s->deferred_status); + s->deferred_complete = false; + } return old_val; case ESP_TCHI: /* Return the unique id if the value has never been written */ @@ -582,6 +599,8 @@ const VMStateDescription vmstate_esp = { VMSTATE_UINT32(ti_wptr, ESPState), VMSTATE_BUFFER(ti_buf, ESPState), VMSTATE_UINT32(status, ESPState), + VMSTATE_UINT32(deferred_status, ESPState), + VMSTATE_BOOL(deferred_complete, ESPState), VMSTATE_UINT32(dma, ESPState), VMSTATE_PARTIAL_BUFFER(cmdbuf, ESPState, 16), VMSTATE_BUFFER_START_MIDDLE_V(cmdbuf, ESPState, 16, 4), diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events index 0fb6a99..2fe8a7c 100644 --- a/hw/scsi/trace-events +++ b/hw/scsi/trace-events @@ -167,6 +167,7 @@ esp_handle_satn_stop(uint32_t cmdlen) "cmdlen %d" esp_write_response(uint32_t status) "Transfer status (status=%d)" esp_do_dma(uint32_t cmdlen, uint32_t len) "command len %d + %d" esp_command_complete(void) "SCSI Command complete" +esp_command_complete_deferred(void) "SCSI Command complete deferred" esp_command_complete_unexpected(void) "SCSI command completed unexpectedly" esp_command_complete_fail(void) "Command failed" esp_transfer_data(uint32_t dma_left, int32_t ti_size) "transfer %d/%d" diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h index 682a0d2..adab63d 100644 --- a/include/hw/scsi/esp.h +++ b/include/hw/scsi/esp.h @@ -23,6 +23,8 @@ struct ESPState { int32_t ti_size; uint32_t ti_rptr, ti_wptr; uint32_t status; + uint32_t deferred_status; + bool deferred_complete; uint32_t dma; uint8_t ti_buf[TI_BUFSZ]; SCSIBus bus;
On 29/11/18 18:38, Guenter Roeck wrote: > This looks much cleaner than the previous patch and at the same time > identifies the problem. Enabling the new trace shows that the condition > is seen on a regular basis (on average 2-3 times per boot test with > qemu-system-hppa if the test does nothing but shut down immediately). Yes, this looks very nice. I'll queue it if Mark reports that it works, with just s/_esp_command_complete/esp_report_command_complete/. Paolo
On 29/11/2018 17:38, Guenter Roeck wrote: >> This patch is very interesting, as I have a long-running regression trying to boot >> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that >> turned on iothread by default in QEMU. >> >> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get >> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1 >> qemu-system-sparc ..." then it works and you can complete the installation. >> >> So certainly this suggests that there is a race condition still present in ESP >> somewhere. I've given this patch a spin, and in a few quick tests here I was able to >> consistently get further in kernel boot, but it still doesn't completely solve issue >> for me :/ >> > > Can you try the attached patch ? It is a bit cleaner than the first version, > and works for me as well. > > Note that this isn't perfect. Specifically, I see differences in handling > STAT_TC. The controller specification is a bit ambiguous in that regard, > but comparing the qemu code with real controller behavior shows that the > real controller does not reset STAT_TC when reading the interrupt status > register. That doesn't seem to matter for Linux, but it may influence > other guests. Hi Guenter, Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO still hangs in the same place on boot :( Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'" from startup all the way to the point where the kernel hangs on boot whilst enumerating the SCSI bus (it does seem to hang at random points in the bus enumeration process). ATB, Mark.
On 29/11/2018 17:38, Guenter Roeck wrote: > Can you try the attached patch ? It is a bit cleaner than the first version, > and works for me as well. > > Note that this isn't perfect. Specifically, I see differences in handling > STAT_TC. The controller specification is a bit ambiguous in that regard, > but comparing the qemu code with real controller behavior shows that the > real controller does not reset STAT_TC when reading the interrupt status > register. That doesn't seem to matter for Linux, but it may influence > other guests. I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious regressions. Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message to the console during periods of disk access, however that is something that has always happened and isn't something new introduced by this patch. ATB, Mark.
On Thu, Nov 29, 2018 at 06:07:05PM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 17:38, Guenter Roeck wrote: > > >> This patch is very interesting, as I have a long-running regression trying to boot > >> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that > >> turned on iothread by default in QEMU. > >> > >> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get > >> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1 > >> qemu-system-sparc ..." then it works and you can complete the installation. > >> > >> So certainly this suggests that there is a race condition still present in ESP > >> somewhere. I've given this patch a spin, and in a few quick tests here I was able to > >> consistently get further in kernel boot, but it still doesn't completely solve issue > >> for me :/ > >> > > > > Can you try the attached patch ? It is a bit cleaner than the first version, > > and works for me as well. > > > > Note that this isn't perfect. Specifically, I see differences in handling > > STAT_TC. The controller specification is a bit ambiguous in that regard, > > but comparing the qemu code with real controller behavior shows that the > > real controller does not reset STAT_TC when reading the interrupt status > > register. That doesn't seem to matter for Linux, but it may influence > > other guests. > > Hi Guenter, > > Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO > still hangs in the same place on boot :( > Too bad. Is it "same place" as with the first version of the patch, or "same place" as in upstream qemu ? That might be important, as the two patch versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command complete handling). > Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'" > from startup all the way to the point where the kernel hangs on boot whilst > enumerating the SCSI bus (it does seem to hang at random points in the bus > enumeration process). > This is interesting; yours seems to be a different problem. I don't see any command_complete_deferred traces in your log. I also don't see any suspicious activity between esp_raise_irq and esp_lower_irq. Can you try tracing in singlethreaded mode ? Maybe that can help us finding the difference. Thanks, Guenter
On Thu, Nov 29, 2018 at 06:34:54PM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 17:38, Guenter Roeck wrote: > > > Can you try the attached patch ? It is a bit cleaner than the first version, > > and works for me as well. > > > > Note that this isn't perfect. Specifically, I see differences in handling > > STAT_TC. The controller specification is a bit ambiguous in that regard, > > but comparing the qemu code with real controller behavior shows that the > > real controller does not reset STAT_TC when reading the interrupt status > > register. That doesn't seem to matter for Linux, but it may influence > > other guests. > > I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this > patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious > regressions. > > Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message > to the console during periods of disk access, however that is something that has > always happened and isn't something new introduced by this patch. > That may be because reading the interrupt status resets the TC bit. As mentioned above, I think it shouldn't do that. Just a wild guess, but it might be worth a try. Can you remove "s->rregs[ESP_RSTAT] &= ~STAT_TC;" from the ESP_RINTR case in esp_reg_read() and see what happens ? [That may expose situations where STAT_TC _should_ be cleared but isn't, so we may hit other problems when doing that.] Thanks, Guenter
On 29/11/2018 19:00, Guenter Roeck wrote: >> Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO >> still hangs in the same place on boot :( >> > Too bad. Is it "same place" as with the first version of the patch, or > "same place" as in upstream qemu ? That might be important, as the two patch > versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command > complete handling). I'd say same place as in upstream QEMU, although again the exact location does vary so I have to keep running it several times to get a feel for whether or not it is in improvement. >> Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'" >> from startup all the way to the point where the kernel hangs on boot whilst >> enumerating the SCSI bus (it does seem to hang at random points in the bus >> enumeration process). >> > This is interesting; yours seems to be a different problem. I don't see any > command_complete_deferred traces in your log. I also don't see any suspicious > activity between esp_raise_irq and esp_lower_irq. > > Can you try tracing in singlethreaded mode ? Maybe that can help us finding > the difference. Attached are the "-trace 'esp*' -trace 'scsi*'" outputs for a bad (normal) boot and good (single threaded) boot for comparison. ATB, Mark.
On 29/11/2018 19:07, Guenter Roeck wrote: >> I've now completed a boot test of all my SPARC32 OpenBIOS CDROM images with this >> patch, and whilst it doesn't solve my NextSTEP issue, I don't see any obvious >> regressions. >> >> Note that NetBSD SPARC32 tends to spit out the occasional "!TC on data xfer" message >> to the console during periods of disk access, however that is something that has >> always happened and isn't something new introduced by this patch. >> > > That may be because reading the interrupt status resets the TC bit. > As mentioned above, I think it shouldn't do that. Just a wild guess, but > it might be worth a try. Can you remove "s->rregs[ESP_RSTAT] &= ~STAT_TC;" > from the ESP_RINTR case in esp_reg_read() and see what happens ? > > [That may expose situations where STAT_TC _should_ be cleared but isn't, > so we may hit other problems when doing that.] I've tried that, and over a number of boots it does seem to allow the boot to get further: with this change I'd say around 95% of the time NextSTEP now gets past the SCSI bus enumeration and hangs at the point where it tries to mount the root filesystem. Interestingly enough even if I comment out that line I still see the "!TC on data xfer" messages appearing on the NetBSD console... ATB, Mark.
On Thu, Nov 29, 2018 at 07:33:15PM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 19:00, Guenter Roeck wrote: > > >> Thanks for the patch. I just gave it a quick test, and unfortunately my NextSTEP ISO > >> still hangs in the same place on boot :( > >> > > Too bad. Is it "same place" as with the first version of the patch, or > > "same place" as in upstream qemu ? That might be important, as the two patch > > versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command > > complete handling). > > I'd say same place as in upstream QEMU, although again the exact location does vary > so I have to keep running it several times to get a feel for whether or not it is in > improvement. > > >> Not sure if it helps, but attached is a simple trace backend log from "-trace 'esp*'" > >> from startup all the way to the point where the kernel hangs on boot whilst > >> enumerating the SCSI bus (it does seem to hang at random points in the bus > >> enumeration process). > >> > > This is interesting; yours seems to be a different problem. I don't see any > > command_complete_deferred traces in your log. I also don't see any suspicious > > activity between esp_raise_irq and esp_lower_irq. > > > > Can you try tracing in singlethreaded mode ? Maybe that can help us finding > > the difference. > > Attached are the "-trace 'esp*' -trace 'scsi*'" outputs for a bad (normal) boot and > good (single threaded) boot for comparison. > This is really weird. The "bad" log just stops. Up to that point, the "good" log is virtually identical (liteally up to line 83602 in both logs). Is it possible that there is some qemu-internal race condition that has nothing to do with esp itself, one that causes qemu to lock up ? Thanks, Guenter
diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c index 630d923..6af74bc 100644 --- a/hw/scsi/esp.c +++ b/hw/scsi/esp.c @@ -415,6 +415,16 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr) } break; case ESP_RINTR: + if (s->rintr_saved) { + old_val = s->rintr_saved; + s->rintr_saved = 0; + if (!(s->rregs[ESP_RINTR])) { + s->rregs[ESP_RSTAT] &= ~STAT_TC; + s->rregs[ESP_RSEQ] = SEQ_CD; + esp_lower_irq(s); + } + return old_val; + } /* Clear sequence step, interrupt register and all status bits except TC */ old_val = s->rregs[ESP_RINTR]; @@ -429,6 +439,34 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr) if (!s->tchi_written) { return s->chip_id; } + case ESP_RSTAT: + /* + * After receiving an interrupt, the guest OS reads + * RSTAT, RSEQ, and RINTR. When reading RINTR, + * RSTAT and RSEQ are updated. It was observed that + * esp_command_complete() and with it esp_dma_done() + * was called after the guest OS reads RSTAT, but + * before it was able to read RINTR. In other words, + * the host would read RINTR associated with the + * old value of RSTAT, not the new value. Since RSTAT + * was supposed to reflect STAT_ST in this situation, + * the host OS got confused, and the operation stalled. + * Remedy the situation by caching both ESP_RINTR and + * ESP_RSEQ. Return those values until ESP_RINTR is read. + * Only do this if an interrupt is pending to limit its + * impact. + */ + if (s->rregs[ESP_RSTAT] & STAT_INT) { + s->rintr_saved = s->rregs[ESP_RINTR]; + s->rseq_saved = s->rregs[ESP_RSEQ]; + s->rregs[ESP_RINTR] = 0; + } + break; + case ESP_RSEQ: + if (s->rintr_saved) { + return s->rseq_saved; + } + break; default: break; } @@ -577,6 +615,8 @@ const VMStateDescription vmstate_esp = { .fields = (VMStateField[]) { VMSTATE_BUFFER(rregs, ESPState), VMSTATE_BUFFER(wregs, ESPState), + VMSTATE_UINT8(rintr_saved, ESPState), + VMSTATE_UINT8(rseq_saved, ESPState), VMSTATE_INT32(ti_size, ESPState), VMSTATE_UINT32(ti_rptr, ESPState), VMSTATE_UINT32(ti_wptr, ESPState), diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h index 682a0d2..342f607 100644 --- a/include/hw/scsi/esp.h +++ b/include/hw/scsi/esp.h @@ -17,6 +17,8 @@ typedef struct ESPState ESPState; struct ESPState { uint8_t rregs[ESP_REGS]; uint8_t wregs[ESP_REGS]; + uint8_t rintr_saved; + uint8_t rseq_saved; qemu_irq irq; uint8_t chip_id; bool tchi_written;
The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers to reflect a consistent state. However, it is possible that the registers can change after RSTAT was read, but before RINTR is read. Guest OS qemu -------- ---- Read RSTAT esp_command_complete() RSTAT = STAT_ST esp_dma_done() RSTAT |= STAT_TC RSEQ = 0 RINTR = INTR_BS Read RSEQ Read RINTR RINTR = 0 RSTAT &= ~STAT_TC RSEQ = SEQ_CD The guest OS would then try to handle INTR_BS combined with an old value of RSTAT. This sometimes resulted in lost events, spurious interrupts, guest OS confusion, and stalled SCSI operations. A typical guest error log (observed with various versions of Linux) looks as follows. scsi host1: Spurious irq, sreg=13. ... scsi host1: Aborting command [84531f10:2a] scsi host1: Current command [f882eea8:35] scsi host1: Queued command [84531f10:2a] scsi host1: Active command [f882eea8:35] scsi host1: Dumping command log scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c] scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c] scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c] ... Adress the situation by caching RINTR and RSEQ when RSTAT is read and using the cached values when the respective registers are read. Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- I am not too happy with this solution (it looks kludgy to me), but it does work. With this series applied, I have not seen a single spurious interrupt after hundreds of boots, and no stalls. Prior to that, spurious interrupts were seen with pretty much every boot, and the stall occurred on a regular basis (roughly every other boot with qemu-system-hppa, less with others). If anyone has a better idea, please let me know, and I'll be happy to test it. hw/scsi/esp.c | 40 ++++++++++++++++++++++++++++++++++++++++ include/hw/scsi/esp.h | 2 ++ 2 files changed, 42 insertions(+)