From patchwork Fri Jan 9 01:42:06 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mario Smarduch X-Patchwork-Id: 5596351 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5A3339F2ED for ; Fri, 9 Jan 2015 01:45:23 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3522D20561 for ; Fri, 9 Jan 2015 01:45:22 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 24A592055D for ; Fri, 9 Jan 2015 01:45:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1Y9ObA-00046g-RA; Fri, 09 Jan 2015 01:42:48 +0000 Received: from mailout2.w2.samsung.com ([211.189.100.12] helo=usmailout2.samsung.com) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Y9Ob5-0003tJ-2h for linux-arm-kernel@lists.infradead.org; Fri, 09 Jan 2015 01:42:44 +0000 Received: from uscpsbgex3.samsung.com (u124.gpu85.samsung.co.kr [203.254.195.124]) by mailout2.w2.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0NHV001MCZEIEG10@mailout2.w2.samsung.com> for linux-arm-kernel@lists.infradead.org; Thu, 08 Jan 2015 20:42:18 -0500 (EST) X-AuditID: cbfec37c-b7f496d000000b40-90-54af31fa567e Received: from usmmp1.samsung.com ( [203.254.195.77]) by uscpsbgex3.samsung.com (USCPEXMTA) with SMTP id 78.A1.02880.AF13FA45; Thu, 08 Jan 2015 20:42:18 -0500 (EST) Received: from sisasmtp.sisa.samsung.com ([105.144.21.116]) by usmmp1.samsung.com (Oracle Communications Messaging Server 7u4-27.01(7.0.4.27.0) 64bit (built Aug 30 2012)) with ESMTP id <0NHV004XUZEI2B20@usmmp1.samsung.com>; Thu, 08 Jan 2015 20:42:18 -0500 (EST) Received: from mjsmard-530U3C-530U4C-532U3C.sisa.samsung.com (105.144.129.79) by SISAEX02SJ.sisa.samsung.com (105.144.21.116) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 08 Jan 2015 17:42:17 -0800 From: Mario Smarduch To: christoffer.dall@linaro.org, marc.zyngier@arm.com Subject: [PATCH RESEND v15 07/10] KVM: arm: page logging 2nd stage fault handling Date: Thu, 08 Jan 2015 17:42:06 -0800 Message-id: <1420767727-20851-1-git-send-email-m.smarduch@samsung.com> X-Mailer: git-send-email 1.7.9.5 MIME-version: 1.0 X-Originating-IP: [105.144.129.79] X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrHLMWRmVeSWpSXmKPExsVy+t9hX91fhutDDF481rR48fofo8WcqYUW H08dZ7fY9Pgaq8XfO//YLPZv+8fqwOaxZt4aRo871/aweZzftIbZY/OSeo/3+66yeXzeJBfA FsVlk5Kak1mWWqRvl8CV8fn8KpaCOWYVh27PY2lgPKPdxcjJISFgIvHo7XpmCFtM4sK99Wxd jFwcQgLLGCWObZ/NCuH0MknsbzjPDuFcZJRonDCdBaSFTUBXYv+9jUAJDg4RAVOJ5jYOkBpm gdlA3c/Ps4HEhQWCJV73lIGUswioSpy+0MEOYvMKuEn8eNTBCFIiIaAgMWeSDURYUOLH5Hss IGFmAQmJ55+VQMJCQJ3bbj5nhLhTSWLa4avsExgFZiHpmIXQsYCRaRWjWGlxckFxUnpqhbFe cWJucWleul5yfu4mRkgY1+xgvPfV5hCjAAejEg/vBrn1IUKsiWXFlbmHGCU4mJVEeO3UgEK8 KYmVValF+fFFpTmpxYcYmTg4pRoYm6SY9q3rzr7JWcDe0tR5aE+ixLub+pFzI2O9tp8o+OR4 dunJBfEP/i/fUrZCdabmIdbCrjnXK9qc9ma1zVIs6BOfnZgmcYK50k10ElvK+tm7er/IvWr9 aW3E8XvGPSm96DCtyf8Xi2u7rV4XzzmteFW4pbXpeV5+lvJ0kxiHzhn+99c/undJiaU4I9FQ i7moOBEA3vWIUUECAAA= X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150108_174243_209908_B9EEF681 X-CRM114-Status: GOOD ( 16.01 ) X-Spam-Score: -4.9 (----) Cc: linux-arm-kernel@lists.infradead.org, pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Mario Smarduch X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds support for handling 2nd stage page faults during migration, it disables faulting in huge pages, and dissolves huge pages to normal pages. In case migration is canceled huge pages are used again, if memory conditions permit it. I applies cleanly on top patches series posted Dec 15: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-December/012826.html Patch number #11 of the series has be dropped. Signed-off-by: Mario Smarduch --- Change Log since last RESEND: - fixed bug exposed __get_user_pages_fast(), when region is writable prevent write protection of pte so we can handle a future write fault and mark page dirty. - Removed marking entire huge page dirty on initial dirty log read. - don't dissolve non-writable huge pages - Made updates based on Christoffers comments - renamed logging status function to memslot_is_logging() - changes few values to bool from - streamlined user_mem_abort() to eliminate extra conditional checks --- arch/arm/kvm/mmu.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 87 insertions(+), 5 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 73d506f..2bfe22d 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -47,6 +47,18 @@ static phys_addr_t hyp_idmap_vector; #define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x)) #define kvm_pud_huge(_x) pud_huge(_x) +#define KVM_S2PTE_FLAG_IS_IOMAP (1UL << 0) +#define KVM_S2PTE_FLAG_LOGGING_ACTIVE (1UL << 1) + +static bool memslot_is_logging(struct kvm_memory_slot *memslot) +{ +#ifdef CONFIG_ARM + return !!memslot->dirty_bitmap; +#else + return false; +#endif +} + static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) { /* @@ -59,6 +71,25 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa); } +/** + * stage2_dissolve_pmd() - clear and flush huge PMD entry + * @kvm: pointer to kvm structure. + * @addr: IPA + * @pmd: pmd pointer for IPA + * + * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. Marks all + * pages in the range dirty. + */ +static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) +{ + if (!kvm_pmd_huge(*pmd)) + return; + + pmd_clear(pmd); + kvm_tlb_flush_vmid_ipa(kvm, addr); + put_page(virt_to_page(pmd)); +} + static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min, int max) { @@ -703,10 +734,13 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, - phys_addr_t addr, const pte_t *new_pte, bool iomap) + phys_addr_t addr, const pte_t *new_pte, + unsigned long flags) { pmd_t *pmd; pte_t *pte, old_pte; + bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; + bool logging_active = flags & KVM_S2PTE_FLAG_LOGGING_ACTIVE; /* Create stage-2 page table mapping - Levels 0 and 1 */ pmd = stage2_get_pmd(kvm, cache, addr); @@ -718,6 +752,13 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, return 0; } + /* + * While dirty page logging - dissolve huge PMD, then continue on to + * allocate page. + */ + if (logging_active) + stage2_dissolve_pmd(kvm, addr, pmd); + /* Create stage-2 page mappings - Level 2 */ if (pmd_none(*pmd)) { if (!cache) @@ -774,7 +815,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, if (ret) goto out; spin_lock(&kvm->mmu_lock); - ret = stage2_set_pte(kvm, &cache, addr, &pte, true); + ret = stage2_set_pte(kvm, &cache, addr, &pte, + KVM_S2PTE_FLAG_IS_IOMAP); spin_unlock(&kvm->mmu_lock); if (ret) goto out; @@ -1002,6 +1044,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool fault_ipa_uncached; + bool can_set_pte_rw = true; + unsigned long set_pte_flags = 0; write_fault = kvm_is_write_fault(vcpu); if (fault_status == FSC_PERM && !write_fault) { @@ -1009,6 +1053,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } + /* Let's check if we will get back a huge page backed by hugetlbfs */ down_read(¤t->mm->mmap_sem); vma = find_vma_intersection(current->mm, hva, hva + 1); @@ -1065,6 +1110,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, spin_lock(&kvm->mmu_lock); if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; + + /* + * When logging is enabled general page fault handling changes: + * - Writable huge pages are dissolved on a read or write fault. + * - pte's are not allowed write permission on a read fault to + * writable region so future writes can be marked dirty + * - access to non-writable region is unchanged + */ + if (memslot_is_logging(memslot) && writable) { + set_pte_flags = KVM_S2PTE_FLAG_LOGGING_ACTIVE; + if (hugetlb) { + gfn += pte_index(fault_ipa); + pfn += pte_index(fault_ipa); + hugetlb = false; + } + force_pte = true; + if (!write_fault) + can_set_pte_rw = false; + } + if (!hugetlb && !force_pte) hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa); @@ -1082,16 +1147,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd); } else { pte_t new_pte = pfn_pte(pfn, mem_type); - if (writable) { + + if (pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE)) + set_pte_flags |= KVM_S2PTE_FLAG_IS_IOMAP; + + /* + * Don't set write permission, for non-writable region, and + * for read fault to writable region while logging. + */ + if (writable && can_set_pte_rw) { kvm_set_s2pte_writable(&new_pte); kvm_set_pfn_dirty(pfn); } coherent_cache_guest_page(vcpu, hva, PAGE_SIZE, fault_ipa_uncached); ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, - pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE)); + set_pte_flags); } + if (write_fault) + mark_page_dirty(kvm, gfn); out_unlock: spin_unlock(&kvm->mmu_lock); @@ -1242,7 +1317,14 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data) { pte_t *pte = (pte_t *)data; - stage2_set_pte(kvm, NULL, gpa, pte, false); + /* + * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE + * flag clear because MMU notifiers will have unmapped a huge PMD before + * calling ->change_pte() (which in turn calls kvm_set_spte_hva()) and + * therefore stage2_set_pte() never needs to clear out a huge PMD + * through this calling path. + */ + stage2_set_pte(kvm, NULL, gpa, pte, 0); }