From patchwork Thu Sep 22 07:51:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Herongguang (Stephen)" X-Patchwork-Id: 9344677 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BD95260757 for ; Thu, 22 Sep 2016 08:08:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFEE92A8A2 for ; Thu, 22 Sep 2016 08:08:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A4DA12A8AA; Thu, 22 Sep 2016 08:08:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9CF842A8A2 for ; Thu, 22 Sep 2016 08:08:46 +0000 (UTC) Received: from localhost ([::1]:59056 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmz3l-0000Da-5j for patchwork-qemu-devel@patchwork.kernel.org; Thu, 22 Sep 2016 04:08:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41326) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmynp-0007a8-O9 for qemu-devel@nongnu.org; Thu, 22 Sep 2016 03:52:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmynj-0007ar-VW for qemu-devel@nongnu.org; Thu, 22 Sep 2016 03:52:16 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:7757) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmynj-0007Yp-1Q for qemu-devel@nongnu.org; Thu, 22 Sep 2016 03:52:11 -0400 Received: from 172.24.1.136 (EHLO szxeml430-hub.china.huawei.com) ([172.24.1.136]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id CIG15151; Thu, 22 Sep 2016 15:51:46 +0800 (CST) Received: from [127.0.0.1] (10.177.19.20) by szxeml430-hub.china.huawei.com (10.82.67.185) with Microsoft SMTP Server id 14.3.235.1; Thu, 22 Sep 2016 15:51:35 +0800 To: , , , , , References: <57D90289.6020003@huawei.com> From: "Herongguang (Stephen)" Message-ID: <57E38D7D.1050806@huawei.com> Date: Thu, 22 Sep 2016 15:51:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <57D90289.6020003@huawei.com> X-Originating-IP: [10.177.19.20] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.57E38D98.010D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 939ef14ba19f53acd62388bab302f91b X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 119.145.14.66 Subject: [Qemu-devel] [RFC/PATCH 2] kvm: x86: handle KVM_SET_VCPU_EVENTS/KVM_VCPUEVENT_VALID_SMM properly X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: arei.gonglei@huawei.com, "Huangweidong \(C\)" , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP After making memory consistent between source and destination (https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg03069.html), there can still reproduce instruction emulation failure in destination side if migration when VM’s in grub stage: [2016-09-15 06:29:24] monitor_qapi_event_emit:478 {"timestamp": {"seconds": 1473892164, "microseconds": 99652}, "event": "RESUME"} KVM internal error. Suberror: 1 emulation failure EAX=000000b5 EBX=00008fc6 ECX=00005678 EDX=00000000 ESI=000f254b EDI=00000000 EBP=0000f958 ESP=000ee958 EIP=00008000 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 ffffffff 00809300 CS =a000 000a0000 ffffffff 00809300 SS =0000 00000000 ffffffff 00809300 DS =0000 00000000 ffffffff 00809300 FS =0000 00000000 ffffffff 00809300 GS =0000 00000000 ffffffff 00809300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 000f71a0 00000037 IDT= 00000000 00000000 CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [2016-09-15 06:29:24] virtio_set_status:712 virtio-blk device status is 7 that means DRIVER OK [2016-09-15 06:29:24] monitor_qapi_event_emit:478 {"timestamp": {"seconds": 1473892164, "microseconds": 106738}, "event": "STOP"} Note CS+EIP=0xA8000, which is within SMRAM, however SMM=0 is confusing! So I found that in kvm_put_vcpu_events, events.flags(KVM_VCPUEVENT_VALID_SMM) is overwritten by setting events.flags to 0, which is obvious mistaken. So it comes to patch 3. After patch 3, however it results kvm-kmod crash: [69328.761479] kvm [51353]: vcpu7, guest rIP: 0xfd31c unhandled wrmsr: 0xd1 data 0 [69337.406083] BUG: unable to handle kernel NULL pointer dereference at (null) [69337.414193] IP: [] gfn_to_rmap+0xcd/0xe0 [kvm] [69337.420357] PGD 0 [69337.422514] Oops: 0000 [#1] SMP [69337.425783] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl lockd grace sunrpc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_conntrack ipt_REJECT nf_reject_ipv4 bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack libcrc32c ipmi_devintf ipmi_si ipmi_msghandler intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw kvm_intel gf128mul iTCO_wdt kvm glue_helper iTCO_vendor_support ablk_helper cryptd pcspkr sg mei_me mei ioatdma shpchp lpc_ich wmi mfd_core i2c_i801 vhost_net tun vhost macvtap macvlan ip_tables ext4 jbd2 mbcache sr_mod sd_mod cdrom isci libsas igb ahci libahci scsi_transport_sas crc32c_intel libata ptp pps_core serio_raw i2c_algo_bit megaraid_sas i2c_co! re dca d m_mod vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio [69337.524020] CPU: 3 PID: 51375 Comm: CPU 0/KVM Not tainted 4.7.0 #1 [69337.530315] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2288 V2-8S/BC11SRSB1, BIOS RMIBV365 09/06/2013 [69337.540644] task: ffff88104bcc8000 ti: ffff881055650000 task.ti: ffff881055650000 [69337.548382] RIP: 0010:[] [] gfn_to_rmap+0xcd/0xe0 [kvm] [69337.557105] RSP: 0018:ffff881055653a88 EFLAGS: 00010202 [69337.562541] RAX: ffffc9000b3cf428 RBX: ffff88085624a538 RCX: 000000000000000c [69337.569788] RDX: ffff88083a6e2bd0 RSI: 00000000000000a7 RDI: 00000000000000a0 [69337.577036] RBP: ffff881055653a88 R08: 000000000000000c R09: ffffc9000b3cf008 [69337.584283] R10: ffffc9000b3cf000 R11: 0000000000000037 R12: ffff8810569d0000 [69337.591530] R13: 0000000000000000 R14: 00000000000000a7 R15: ffff8808d624a538 [69337.598777] FS: 00007faeb900d700(0000) GS:ffff88085e0c0000(0000) knlGS:0000000000000000 [69337.607116] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69337.612978] CR2: 0000000000000000 CR3: 000000104c203000 CR4: 00000000000426e0 [69337.620225] Stack: [69337.622369] ffff881055653af0 ffffffffa056b28d ffffffff00000000 0000000000000001 [69337.630059] 0000000000000001 00007fadb50a7000 0100000000000000 ffff88083a6e2bd0 [69337.637751] ffff8810569d0000 0000000000000001 000ffffffffff000 ffff880000000000 [69337.645442] Call Trace: [69337.648029] [] mmu_set_spte+0x19d/0x280 [kvm] [69337.654166] [] __direct_map.part.122+0x19b/0x210 [kvm] [69337.661078] [] tdp_page_fault+0x239/0x260 [kvm] [69337.667385] [] kvm_mmu_page_fault+0x60/0xf0 [kvm] [69337.673856] [] handle_ept_violation+0x96/0x190 [kvm_intel] [69337.681106] [] vmx_handle_exit+0x1f0/0xc30 [kvm_intel] [69337.688019] [] vcpu_enter_guest+0x90e/0x1190 [kvm] [69337.694585] [] kvm_arch_vcpu_ioctl_run+0xcd/0x3f0 [kvm] [69337.701581] [] kvm_vcpu_ioctl+0x295/0x610 [kvm] [69337.707880] [] ? do_futex+0x11c/0x530 [69337.713314] [] do_vfs_ioctl+0xa6/0x5c0 [69337.718833] [] ? do_audit_syscall_entry+0x66/0x70 [69337.725303] [] ? syscall_trace_enter_phase1+0x11f/0x140 [69337.732291] [] SyS_ioctl+0x79/0x90 [69337.737470] [] do_syscall_64+0x62/0x110 [69337.743085] [] entry_SYSCALL64_slow_path+0x25/0x25 [69337.749641] Code: 8b 38 0f b6 52 28 5d 83 e2 0f 83 ea 01 8d 0c d2 48 63 d2 48 8b 44 d0 18 48 d3 ee 48 d3 ef 48 29 fe 48 8d 04 f0 c3 8d 48 01 eb 80 <48> 8b 3c 25 00 00 00 00 31 c0 eb cb 0f 1f 80 00 00 00 00 66 66 [69337.769707] RIP [] gfn_to_rmap+0xcd/0xe0 [kvm] [69337.775938] RSP [69337.779552] CR2: 0000000000000000 [69337.783467] ---[ end trace 5350f10b8de91e83 ]--- [69337.843957] Kernel panic - not syncing: Fatal exception [69337.849430] Kernel Offset: disabled I found that when kernel crashes, in tdp_page_fault ->__direct_map ->mmu_set_spte ->rmap_add->gfn_to_rmap-> __gfn_to_memslot &__gfn_to_rmap, in __gfn_to_memslot returned NULL, and __gfn_to_rmap uses this NULL pointer (slot) without checking, which is bad. After some investigation, I found when kernel crashes, gfn is 0xA7, in SMRAM region again. And at this time, kvm_memslots_for_spte_role is false, however actually (vcpu->arch.hflags & HF_SMM_MASK) is true! So I think there is some lacking in kvm-kmod’s kvm_vcpu_ioctl_x86_set_vcpu_events that handles KVM_VCPUEVENT_VALID_SMM. I tried following patch, it seems works fine. Do you think this patch is appropriate or not enough? Thanks. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 19f9f9e..f39e839 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3013,8 +3013,10 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu, vcpu->arch.apic->sipi_vector = events->sipi_vector; if (events->flags & KVM_VCPUEVENT_VALID_SMM) { - if (events->smi.smm) + if (events->smi.smm) { vcpu->arch.hflags |= HF_SMM_MASK; + kvm_mmu_reset_context(vcpu); + } else vcpu->arch.hflags &= ~HF_SMM_MASK; vcpu->arch.smi_pending = events->smi.pending;