diff mbox

[kvm-Bugs-1895893] KVM-60+ halts, when using SCSI

Message ID E1OQ5JF-0004wy-T3@sfs-web-8.v29.ch3.sourceforge.com (mailing list archive)
State New, archived
Headers show

Commit Message

SourceForge.net June 19, 2010, 9:10 p.m. UTC
None
diff mbox

Patch

====================================================
Dmesg shows:

apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 0

...looping forever.

-Alexey "Technologov", 18.02.2008.

----------------------------------------------------------------------

>Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-19 23:10

Message:
Hi,

I verified with Marcelo (mtosatii)  and the bug is supposed to be fixed.
Since there has been no activity in this one for more than two years I
assume that is the case.

If it reappears please open a new bug in launchpad.

Thanks,
Jes


----------------------------------------------------------------------

Comment By: Alf Mel (alfmel)
Date: 2008-04-28 22:11

Message:
Logged In: YES 
user_id=1865908
Originator: NO

OK.  I've applied the matley patch and your debug patch to KVM 66.  I've
also been able to reproduce the problem on a raw SCSI disk while installing
Windows 2003.  You can find the log at:

http://mel.byu.edu/kvm-scsi-debug.log.bz2

----------------------------------------------------------------------

Comment By: Marcelo Tosatti (mtosatti)
Date: 2008-04-27 01:45

Message:
Logged In: YES 
user_id=2022487
Originator: NO

Alexey, Alberto,

I'm unable to reproduce the problem with the Linux driver.

The Windows SCSI SCRIPTS is different so that might the reason. The 
state machine is relatively complex depending on this SCRIPTS code.

Please try the following:

1 - Attempt to reproduce the problem with raw disk instead of qcow2.
2 - Apply matley's patch below, and on top of that, this debug patch:
http://people.redhat.com/~mtosatti/lsi-debug-crash.patch

And then run qemu-kvm as usual, but redirect stderr output to a file:

# qemu-kvm options 2> log-scsi-crash.txt

Once the crash happens, there should be a pattern that repeats in this
output. 
With that information its easier to understand what is going on.

Thanks.


----------------------------------------------------------------------

Comment By: Alf Mel (alfmel)
Date: 2008-04-12 00:46

Message:
Logged In: YES 
user_id=1865908
Originator: NO

I've confirmed the problem with KVM-65 as well.  I applied the patch but
it didn't work; I still experienced lockups.  I am trying to install
Windows Server 2003 on a SCSI disk and the installation keeps locking up on
different parts of the file copy process.  I'm using qcow2 disk format.  I
tried using raw format and it would lock up consistently when formatting
the disk.  I have tried installing W2K3 at least a dozen times with the
same lockups.  As part of my configuration, I move the monitor to run on a
telnet server.  When the lockup occurs, I can't connect to the monitor via
telnet.

I am also experiencing boot problems with Grub on SCSI disks.  I reported
the problem on the mailing list:

http://article.gmane.org/gmane.comp.emulators.kvm.devel/15884

I don't know if the problems are related.

----------------------------------------------------------------------

Comment By: lanconnected (lanconnected)
Date: 2008-04-08 18:17

Message:
Logged In: YES 
user_id=2041746
Originator: NO

Applied proposed patch on kvm-65. Windows XP Pro can be installed on scsi
disk and boots up, but hangs unpredictably during disk activity. SDL
windows can't be closed, kvm can only be killed with kill -9.

----------------------------------------------------------------------

Comment By: Matteo Frigo (matley)
Date: 2008-03-30 14:58

Message:
Logged In: YES 
user_id=35769
Originator: NO

The bug seems to have nothing to do with Windows.  You can reproduce the
bug
in kvm-63 and kvm-64 by creating an empty qcow2 scsi disk and running
``dd if=/dev/sda of=/dev/null bs=1M'' in linux.

The patch below seems to fix the problem (at least with linux, I haven't
tried Windows).  If I understand the AIO layer correctly,
scsi_read_data()
and scsi_write_data() can be called again before the bdrv_aio_read
call returns.  If this happens, the original code reissues the same
request twice, which is incorrect.  The patch increments the read/writer
counters before invoking the AIO layer.

diff -aur kvm-64.old/qemu/hw/scsi-disk.c kvm-64.new/qemu/hw/scsi-disk.c
--- kvm-64.old/qemu/hw/scsi-disk.c	2008-03-26 08:49:35.000000000 -0400
+++ kvm-64.new/qemu/hw/scsi-disk.c	2008-03-30 08:37:25.000000000 -0400
@@ -196,12 +196,12 @@ 
         n = SCSI_DMA_BUF_SIZE / 512;
 
     r->buf_len = n * 512;
-    r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n,
+    r->sector += n;
+    r->sector_count -= n;
+    r->aiocb = bdrv_aio_read(s->bdrv, r->sector - n, r->dma_buf, n,
                              scsi_read_complete, r);
     if (r->aiocb == NULL)
         scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-    r->sector += n;
-    r->sector_count -= n;
 }
 
 static void scsi_write_complete(void * opaque, int ret)
@@ -248,12 +248,12 @@ 
         BADF("Data transfer already in progress\n");
     n = r->buf_len / 512;
     if (n) {
-        r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n,
+        r->sector += n;
+        r->sector_count -= n;
+        r->aiocb = bdrv_aio_write(s->bdrv, r->sector - n, r->dma_buf, n,
                                   scsi_write_complete, r);
         if (r->aiocb == NULL)
             scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-        r->sector += n;
-        r->sector_count -= n;
     } else {
         /* Invoke completion routine to fetch data from host.  */
         scsi_write_complete(r, 0);