From patchwork Tue May 6 18:01:36 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 4123541 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D5CA49F1E1 for ; Tue, 6 May 2014 18:01:58 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C6D4B20254 for ; Tue, 6 May 2014 18:01:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 945EA201EC for ; Tue, 6 May 2014 18:01:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751914AbaEFSBw (ORCPT ); Tue, 6 May 2014 14:01:52 -0400 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:40654 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751152AbaEFSBv (ORCPT ); Tue, 6 May 2014 14:01:51 -0400 Received: from /spool/local by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 6 May 2014 23:31:49 +0530 Received: from d28dlp03.in.ibm.com (9.184.220.128) by e28smtp03.in.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 6 May 2014 23:31:46 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id D2DDC1258048; Tue, 6 May 2014 23:30:42 +0530 (IST) Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s46I1pll59441330; Tue, 6 May 2014 23:31:51 +0530 Received: from d28av03.in.ibm.com (localhost [127.0.0.1]) by d28av03.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s46I1ipC014837; Tue, 6 May 2014 23:31:45 +0530 Received: from skywalker.in.ibm.com ([9.124.214.233]) by d28av03.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s46I1i67014822; Tue, 6 May 2014 23:31:44 +0530 From: "Aneesh Kumar K.V" To: agraf@suse.de, benh@kernel.crashing.org, paulus@samba.org Cc: linuxppc-dev@lists.ozlabs.org, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org, "Aneesh Kumar K.V" Subject: [PATCH V2] KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest Date: Tue, 6 May 2014 23:31:36 +0530 Message-Id: <1399399296-17711-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.9.1 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14050618-3864-0000-0000-00000E0F4CB4 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. [ Note that the actual page size is always >= base page size ]. We use "ibm,segment-page-sizes" device tree node to advertise the MPSS support to PAPR guest. The penc encoding indicates whether we support a specific combination of base page size and actual page size in the same segment. We also use the penc value in the LP encoding of HPTE entry. This patch exposes MPSS support to KVM guest by advertising the feature via "ibm,segment-page-sizes". It also adds the necessary changes to decode the base page size and the actual page size correctly from the HPTE entry. Signed-off-by: Aneesh Kumar K.V --- Changes from V1: * Update commit message * Rename variables as per review feedback arch/powerpc/include/asm/kvm_book3s_64.h | 146 ++++++++++++++++++++++++++----- arch/powerpc/kvm/book3s_hv.c | 7 ++ 2 files changed, 130 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 51388befeddb..fddb72b48ce9 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -77,34 +77,122 @@ static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) return old == 0; } +static inline int __hpte_actual_psize(unsigned int lp, int psize) +{ + int i, shift; + unsigned int mask; + + /* start from 1 ignoring MMU_PAGE_4K */ + for (i = 1; i < MMU_PAGE_COUNT; i++) { + + /* invalid penc */ + if (mmu_psize_defs[psize].penc[i] == -1) + continue; + /* + * encoding bits per actual page size + * PTE LP actual page size + * rrrr rrrz >=8KB + * rrrr rrzz >=16KB + * rrrr rzzz >=32KB + * rrrr zzzz >=64KB + * ....... + */ + shift = mmu_psize_defs[i].shift - LP_SHIFT; + if (shift > LP_BITS) + shift = LP_BITS; + mask = (1 << shift) - 1; + if ((lp & mask) == mmu_psize_defs[psize].penc[i]) + return i; + } + return -1; +} + static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { - unsigned long rb, va_low; + int b_psize, a_psize; + unsigned int penc; + unsigned long rb = 0, va_low, sllp; + unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1); + + if (!(v & HPTE_V_LARGE)) { + /* both base and actual psize is 4k */ + b_psize = MMU_PAGE_4K; + a_psize = MMU_PAGE_4K; + } else { + for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) { + + /* valid entries have a shift value */ + if (!mmu_psize_defs[b_psize].shift) + continue; + a_psize = __hpte_actual_psize(lp, b_psize); + if (a_psize != -1) + break; + } + } + /* + * Ignore the top 14 bits of va + * v have top two bits covering segment size, hence move + * by 16 bits, Also clear the lower HPTE_V_AVPN_SHIFT (7) bits. + * AVA field in v also have the lower 23 bits ignored. + * For base page size 4K we need 14 .. 65 bits (so need to + * collect extra 11 bits) + * For others we need 14..14+i + */ + /* This covers 14..54 bits of va*/ rb = (v & ~0x7fUL) << 16; /* AVA field */ + /* + * AVA in v had cleared lower 23 bits. We need to derive + * that from pteg index + */ va_low = pte_index >> 3; if (v & HPTE_V_SECONDARY) va_low = ~va_low; - /* xor vsid from AVA */ + /* + * get the vpn bits from va_low using reverse of hashing. + * In v we have va with 23 bits dropped and then left shifted + * HPTE_V_AVPN_SHIFT (7) bits. Now to find vsid we need + * right shift it with (SID_SHIFT - (23 - 7)) + */ if (!(v & HPTE_V_1TB_SEG)) - va_low ^= v >> 12; + va_low ^= v >> (SID_SHIFT - 16); else - va_low ^= v >> 24; + va_low ^= v >> (SID_SHIFT_1T - 16); va_low &= 0x7ff; - if (v & HPTE_V_LARGE) { - rb |= 1; /* L field */ - if (cpu_has_feature(CPU_FTR_ARCH_206) && - (r & 0xff000)) { - /* non-16MB large page, must be 64k */ - /* (masks depend on page size) */ - rb |= 0x1000; /* page encoding in LP field */ - rb |= (va_low & 0x7f) << 16; /* 7b of VA in AVA/LP field */ - rb |= ((va_low << 4) & 0xf0); /* AVAL field (P7 doesn't seem to care) */ - } - } else { - /* 4kB page */ - rb |= (va_low & 0x7ff) << 12; /* remaining 11b of VA */ + + switch (b_psize) { + case MMU_PAGE_4K: + sllp = ((mmu_psize_defs[a_psize].sllp & SLB_VSID_L) >> 6) | + ((mmu_psize_defs[a_psize].sllp & SLB_VSID_LP) >> 4); + rb |= sllp << 5; /* AP field */ + rb |= (va_low & 0x7ff) << 12; /* remaining 11 bits of AVA */ + break; + default: + { + int aval_shift; + /* + * remaining 7bits of AVA/LP fields + * Also contain the rr bits of LP + */ + rb |= (va_low & 0x7f) << 16; + /* + * Now clear not needed LP bits based on actual psize + */ + rb &= ~((1ul << mmu_psize_defs[a_psize].shift) - 1); + /* + * AVAL field 58..77 - base_page_shift bits of va + * we have space for 58..64 bits, Missing bits should + * be zero filled. +1 is to take care of L bit shift + */ + aval_shift = 64 - (77 - mmu_psize_defs[b_psize].shift) + 1; + rb |= ((va_low << aval_shift) & 0xfe); + + rb |= 1; /* L field */ + penc = mmu_psize_defs[b_psize].penc[a_psize]; + rb |= penc << 12; /* LP field */ + break; + } } rb |= (v >> 54) & 0x300; /* B field */ return rb; @@ -112,14 +200,26 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) { + int size, a_psize; + /* Look at the 8 bit LP value */ + unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1); + /* only handle 4k, 64k and 16M pages for now */ if (!(h & HPTE_V_LARGE)) - return 1ul << 12; /* 4k page */ - if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206)) - return 1ul << 16; /* 64k page */ - if ((l & 0xff000) == 0) - return 1ul << 24; /* 16M page */ - return 0; /* error */ + return 1ul << 12; + else { + for (size = 0; size < MMU_PAGE_COUNT; size++) { + /* valid entries have a shift value */ + if (!mmu_psize_defs[size].shift) + continue; + + a_psize = __hpte_actual_psize(lp, size); + if (a_psize != -1) + return 1ul << mmu_psize_defs[a_psize].shift; + } + + } + return 0; } static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8227dba5af0f..f5d4b9d77dc9 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, * support pte_enc here */ (*sps)->enc[0].pte_enc = def->penc[linux_psize]; + /* + * Add 16MB MPSS support if host supports it + */ + if (linux_psize != MMU_PAGE_16M && def->penc[MMU_PAGE_16M] != -1) { + (*sps)->enc[1].page_shift = 24; + (*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M]; + } (*sps)++; }