From patchwork Sat Jun 19 21:10:37 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "SourceForge.net" X-Patchwork-Id: 107011 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.3/8.14.3) with ESMTP id o5JLAiG5016343 for ; Sat, 19 Jun 2010 21:10:44 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751877Ab0FSVKm (ORCPT ); Sat, 19 Jun 2010 17:10:42 -0400 Received: from ch3.sourceforge.net ([216.34.181.60]:50149 "EHLO ch3.sourceforge.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751346Ab0FSVKl (ORCPT ); Sat, 19 Jun 2010 17:10:41 -0400 Received: from www by sfs-web-8.v29.ch3.sourceforge.com with local (Exim 4.69) (envelope-from ) id 1OQ5JF-0004wy-T3; Sat, 19 Jun 2010 21:10:37 +0000 To: noreply@sourceforge.net From: "SourceForge.net" Subject: [ kvm-Bugs-1895893 ] KVM-60+ halts, when using SCSI Mime-Version: 1.0 X-SourceForge-Tracker-unixname: kvm X-SourceForge-Tracker-trackerid: 893831 X-SourceForge-Tracker-itemid: 1895893 X-SourceForge-Tracker-itemstatus: Closed X-SourceForge-Tracker-itemassignee: nobody X-SourceForge-Tracker-itemupdate-reason: Settings changed X-SourceForge-Tracker-itemupdate-username: jessorensen Message-Id: Date: Sat, 19 Jun 2010 21:10:37 +0000 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Sat, 19 Jun 2010 21:10:45 +0000 (UTC) ==================================================== Dmesg shows: apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 0 ...looping forever. -Alexey "Technologov", 18.02.2008. ---------------------------------------------------------------------- >Comment By: Jes Sorensen (jessorensen) Date: 2010-06-19 23:10 Message: Hi, I verified with Marcelo (mtosatii) and the bug is supposed to be fixed. Since there has been no activity in this one for more than two years I assume that is the case. If it reappears please open a new bug in launchpad. Thanks, Jes ---------------------------------------------------------------------- Comment By: Alf Mel (alfmel) Date: 2008-04-28 22:11 Message: Logged In: YES user_id=1865908 Originator: NO OK. I've applied the matley patch and your debug patch to KVM 66. I've also been able to reproduce the problem on a raw SCSI disk while installing Windows 2003. You can find the log at: http://mel.byu.edu/kvm-scsi-debug.log.bz2 ---------------------------------------------------------------------- Comment By: Marcelo Tosatti (mtosatti) Date: 2008-04-27 01:45 Message: Logged In: YES user_id=2022487 Originator: NO Alexey, Alberto, I'm unable to reproduce the problem with the Linux driver. The Windows SCSI SCRIPTS is different so that might the reason. The state machine is relatively complex depending on this SCRIPTS code. Please try the following: 1 - Attempt to reproduce the problem with raw disk instead of qcow2. 2 - Apply matley's patch below, and on top of that, this debug patch: http://people.redhat.com/~mtosatti/lsi-debug-crash.patch And then run qemu-kvm as usual, but redirect stderr output to a file: # qemu-kvm options 2> log-scsi-crash.txt Once the crash happens, there should be a pattern that repeats in this output. With that information its easier to understand what is going on. Thanks. ---------------------------------------------------------------------- Comment By: Alf Mel (alfmel) Date: 2008-04-12 00:46 Message: Logged In: YES user_id=1865908 Originator: NO I've confirmed the problem with KVM-65 as well. I applied the patch but it didn't work; I still experienced lockups. I am trying to install Windows Server 2003 on a SCSI disk and the installation keeps locking up on different parts of the file copy process. I'm using qcow2 disk format. I tried using raw format and it would lock up consistently when formatting the disk. I have tried installing W2K3 at least a dozen times with the same lockups. As part of my configuration, I move the monitor to run on a telnet server. When the lockup occurs, I can't connect to the monitor via telnet. I am also experiencing boot problems with Grub on SCSI disks. I reported the problem on the mailing list: http://article.gmane.org/gmane.comp.emulators.kvm.devel/15884 I don't know if the problems are related. ---------------------------------------------------------------------- Comment By: lanconnected (lanconnected) Date: 2008-04-08 18:17 Message: Logged In: YES user_id=2041746 Originator: NO Applied proposed patch on kvm-65. Windows XP Pro can be installed on scsi disk and boots up, but hangs unpredictably during disk activity. SDL windows can't be closed, kvm can only be killed with kill -9. ---------------------------------------------------------------------- Comment By: Matteo Frigo (matley) Date: 2008-03-30 14:58 Message: Logged In: YES user_id=35769 Originator: NO The bug seems to have nothing to do with Windows. You can reproduce the bug in kvm-63 and kvm-64 by creating an empty qcow2 scsi disk and running ``dd if=/dev/sda of=/dev/null bs=1M'' in linux. The patch below seems to fix the problem (at least with linux, I haven't tried Windows). If I understand the AIO layer correctly, scsi_read_data() and scsi_write_data() can be called again before the bdrv_aio_read call returns. If this happens, the original code reissues the same request twice, which is incorrect. The patch increments the read/writer counters before invoking the AIO layer. diff -aur kvm-64.old/qemu/hw/scsi-disk.c kvm-64.new/qemu/hw/scsi-disk.c --- kvm-64.old/qemu/hw/scsi-disk.c 2008-03-26 08:49:35.000000000 -0400 +++ kvm-64.new/qemu/hw/scsi-disk.c 2008-03-30 08:37:25.000000000 -0400 @@ -196,12 +196,12 @@ n = SCSI_DMA_BUF_SIZE / 512; r->buf_len = n * 512; - r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n, + r->sector += n; + r->sector_count -= n; + r->aiocb = bdrv_aio_read(s->bdrv, r->sector - n, r->dma_buf, n, scsi_read_complete, r); if (r->aiocb == NULL) scsi_command_complete(r, SENSE_HARDWARE_ERROR); - r->sector += n; - r->sector_count -= n; } static void scsi_write_complete(void * opaque, int ret) @@ -248,12 +248,12 @@ BADF("Data transfer already in progress\n"); n = r->buf_len / 512; if (n) { - r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n, + r->sector += n; + r->sector_count -= n; + r->aiocb = bdrv_aio_write(s->bdrv, r->sector - n, r->dma_buf, n, scsi_write_complete, r); if (r->aiocb == NULL) scsi_command_complete(r, SENSE_HARDWARE_ERROR); - r->sector += n; - r->sector_count -= n; } else { /* Invoke completion routine to fetch data from host. */ scsi_write_complete(r, 0);