diff mbox

KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

Message ID 1399224322-22028-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Aneesh Kumar K.V May 4, 2014, 5:25 p.m. UTC
We reserve 5% of total ram for CMA allocation and not using that can
result in us running out of numa node memory with specific
configuration. One caveat is we may not have node local hpt with pinned
vcpu configuration. But currently libvirt also pins the vcpu to cpuset
after creating hash page table.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

Comments

Alexander Graf May 5, 2014, 11:26 a.m. UTC | #1
On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
> We reserve 5% of total ram for CMA allocation and not using that can
> result in us running out of numa node memory with specific
> configuration. One caveat is we may not have node local hpt with pinned
> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
> after creating hash page table.

I don't understand the problem. Can you please elaborate?


Alex

>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>   arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++++++-----------------
>   1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index fb25ebc0af0c..f32896ffd784 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm);
>   
>   long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>   {
> -	unsigned long hpt;
> +	unsigned long hpt = 0;
>   	struct revmap_entry *rev;
>   	struct page *page = NULL;
>   	long order = KVM_DEFAULT_HPT_ORDER;
> @@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>   	}
>   
>   	kvm->arch.hpt_cma_alloc = 0;
> -	/*
> -	 * try first to allocate it from the kernel page allocator.
> -	 * We keep the CMA reserved for failed allocation.
> -	 */
> -	hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
> -			       __GFP_NOWARN, order - PAGE_SHIFT);
> -
> -	/* Next try to allocate from the preallocated pool */
> -	if (!hpt) {
> -		VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
> -		page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
> -		if (page) {
> -			hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
> -			kvm->arch.hpt_cma_alloc = 1;
> -		} else
> -			--order;
> +	VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
> +	page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
> +	if (page) {
> +		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
> +		kvm->arch.hpt_cma_alloc = 1;
>   	}
>   
>   	/* Lastly try successively smaller sizes from the page allocator */

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V May 5, 2014, 2:35 p.m. UTC | #2
Alexander Graf <agraf@suse.de> writes:

> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>> We reserve 5% of total ram for CMA allocation and not using that can
>> result in us running out of numa node memory with specific
>> configuration. One caveat is we may not have node local hpt with pinned
>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>> after creating hash page table.
>
> I don't understand the problem. Can you please elaborate?
>
>

Lets take a system with 100GB RAM. We reserve around 5GB for htab
allocation. Now if we use rest of available memory for hugetlbfs
(because we want all the guest to be backed by huge pages), we would
end up in a situation where we have a few GB of free RAM and 5GB of CMA
reserve area. Now if we allow hash page table allocation to consume the
free space, we would end up hitting page allocation failure for other
non movable kernel allocation even though we still have 5GB CMA reserve
space free.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf May 5, 2014, 3:16 p.m. UTC | #3
> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
> 
> Alexander Graf <agraf@suse.de> writes:
> 
>>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>>> We reserve 5% of total ram for CMA allocation and not using that can
>>> result in us running out of numa node memory with specific
>>> configuration. One caveat is we may not have node local hpt with pinned
>>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>>> after creating hash page table.
>> 
>> I don't understand the problem. Can you please elaborate?
> 
> Lets take a system with 100GB RAM. We reserve around 5GB for htab
> allocation. Now if we use rest of available memory for hugetlbfs
> (because we want all the guest to be backed by huge pages), we would
> end up in a situation where we have a few GB of free RAM and 5GB of CMA
> reserve area. Now if we allow hash page table allocation to consume the
> free space, we would end up hitting page allocation failure for other
> non movable kernel allocation even though we still have 5GB CMA reserve
> space free.

Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails, no?

The fact that KVM uses a good number of normal kernel pages is maybe suboptimal, but shouldn't be a critical problem.


Alex

> 
> -aneesh
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V May 5, 2014, 3:40 p.m. UTC | #4
Alexander Graf <agraf@suse.de> writes:

>> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
>> 
>> Alexander Graf <agraf@suse.de> writes:
>> 
>>>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>>>> We reserve 5% of total ram for CMA allocation and not using that can
>>>> result in us running out of numa node memory with specific
>>>> configuration. One caveat is we may not have node local hpt with pinned
>>>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>>>> after creating hash page table.
>>> 
>>> I don't understand the problem. Can you please elaborate?
>> 
>> Lets take a system with 100GB RAM. We reserve around 5GB for htab
>> allocation. Now if we use rest of available memory for hugetlbfs
>> (because we want all the guest to be backed by huge pages), we would
>> end up in a situation where we have a few GB of free RAM and 5GB of CMA
>> reserve area. Now if we allow hash page table allocation to consume the
>> free space, we would end up hitting page allocation failure for other
>> non movable kernel allocation even though we still have 5GB CMA reserve
>> space free.
>
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

But there is nothing much to swap. Because most of the memory is
reserved for guest RAM via hugetlbfs. 

>
> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

Yes. But then in this case we could do better isn't it ? We already have
a large part of guest RAM kept aside for htab allocation which cannot be
used for non movable allocation. And we ignore that reserve space and
use other areas for hash page table allocation with the current code.

We actually hit this case in one of the test box.

 KVM guest htab at c000001e50000000 (order 30), LPID 1
 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0
 libvirtd cpuset=/ mems_allowed=0,16
 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1
 Call Trace:
 [c000001e3b63f150] [c000000000017330] .show_stack+0x130/0x200(unreliable)
 [c000001e3b63f220] [c00000000087a888] .dump_stack+0x28/0x3c
 [c000001e3b63f290] [c000000000876a4c] .dump_header+0xbc/0x228
 [c000001e3b63f360] [c0000000001dd838].oom_kill_process+0x318/0x4c0
 [c000001e3b63f440] [c0000000001de258] .out_of_memory+0x518/0x550
 [c000001e3b63f520] [c0000000001e5aac].__alloc_pages_nodemask+0xb3c/0xbf0
 [c000001e3b63f700] [c000000000243580] .new_slab+0x440/0x490
 [c000001e3b63f7a0] [c0000000008781fc] .__slab_alloc+0x17c/0x618
 [c000001e3b63f8d0] [c0000000002467fc].kmem_cache_alloc_node_trace+0xcc/0x300
 [c000001e3b63f990] [c00000000010f62c].alloc_fair_sched_group+0xfc/0x200
 [c000001e3b63fa60] [c000000000104f00].sched_create_group+0x50/0xe0
 [c000001e3b63fae0] [c000000000104fc0].cpu_cgroup_css_alloc+0x30/0x80
 [c000001e3b63fb60] [c0000000001513ec] .cgroup_mkdir+0x2bc/0x6e0
 [c000001e3b63fc50] [c000000000275aec] .vfs_mkdir+0x14c/0x220
 [c000001e3b63fcf0] [c00000000027a734] .SyS_mkdirat+0x94/0x110
 [c000001e3b63fdb0] [c00000000027a7e4] .SyS_mkdir+0x34/0x50
 [c000001e3b63fe30] [c000000000009f54] syscall_exit+0x0/0x98


Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB
active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB
unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB
shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB
kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB
active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB
shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB
kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB
writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt May 6, 2014, 12:06 a.m. UTC | #5
On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

Possibly but the fact remains, this can be avoided by making sure that
if we create a CMA reserve for KVM, then it uses it rather than using
the rest of main memory for hash tables.

> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

The point is that we explicitly reserve those pages in CMA for use
by KVM for that specific purpose, but the current code tries first
to get them out of the normal pool.

This is not an optimal behaviour and is what Aneesh patches are
trying to fix.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf May 6, 2014, 7:05 a.m. UTC | #6
On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>> Isn't this a greater problem? We should start swapping before we hit
>> the point where non movable kernel allocation fails, no?
> Possibly but the fact remains, this can be avoided by making sure that
> if we create a CMA reserve for KVM, then it uses it rather than using
> the rest of main memory for hash tables.

So why were we preferring non-CMA memory before? Considering that Aneesh 
introduced that logic in fa61a4e3 I suppose this was just a mistake?

>> The fact that KVM uses a good number of normal kernel pages is maybe
>> suboptimal, but shouldn't be a critical problem.
> The point is that we explicitly reserve those pages in CMA for use
> by KVM for that specific purpose, but the current code tries first
> to get them out of the normal pool.
>
> This is not an optimal behaviour and is what Aneesh patches are
> trying to fix.

I agree, and I agree that it's worth it to make better use of our 
resources. But we still shouldn't crash.

However, reading through this thread I think I've slowly grasped what 
the problem is. The hugetlbfs size calculation.

I guess something in your stack overreserves huge pages because it 
doesn't account for the fact that some part of system memory is already 
reserved for CMA.

So the underlying problem is something completely orthogonal. The patch 
body as is is fine, but the patch description should simply say that we 
should prefer the CMA region because it's already reserved for us for 
this purpose and we make better use of our available resources that way.

All the bits about pinning, numa, libvirt and whatnot don't really 
matter and are just details that led Aneesh to find this non-optimal 
allocation.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt May 6, 2014, 7:19 a.m. UTC | #7
On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
> > On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> >> Isn't this a greater problem? We should start swapping before we hit
> >> the point where non movable kernel allocation fails, no?
> > Possibly but the fact remains, this can be avoided by making sure that
> > if we create a CMA reserve for KVM, then it uses it rather than using
> > the rest of main memory for hash tables.
> 
> So why were we preferring non-CMA memory before? Considering that Aneesh 
> introduced that logic in fa61a4e3 I suppose this was just a mistake?

I assume so.

> >> The fact that KVM uses a good number of normal kernel pages is maybe
> >> suboptimal, but shouldn't be a critical problem.
> > The point is that we explicitly reserve those pages in CMA for use
> > by KVM for that specific purpose, but the current code tries first
> > to get them out of the normal pool.
> >
> > This is not an optimal behaviour and is what Aneesh patches are
> > trying to fix.
> 
> I agree, and I agree that it's worth it to make better use of our 
> resources. But we still shouldn't crash.

Well, Linux hitting out of memory conditions has never been a happy
story :-)

> However, reading through this thread I think I've slowly grasped what 
> the problem is. The hugetlbfs size calculation.

Not really.

> I guess something in your stack overreserves huge pages because it 
> doesn't account for the fact that some part of system memory is already 
> reserved for CMA.

Either that or simply Linux runs out because we dirty too fast...
really, Linux has never been good at dealing with OO situations,
especially when things like network drivers and filesystems try to do
ATOMIC or NOIO allocs...
 
> So the underlying problem is something completely orthogonal. The patch 
> body as is is fine, but the patch description should simply say that we 
> should prefer the CMA region because it's already reserved for us for 
> this purpose and we make better use of our available resources that way.

No.

We give a chunk of memory to hugetlbfs, it's all good and fine.

Whatever remains is split between CMA and the normal page allocator.

Without Aneesh latest patch, when creating guests, KVM starts allocating
it's hash tables from the latter instead of CMA (we never allocate from
hugetlb pool afaik, only guest pages do that, not hash tables).

So we exhaust the page allocator and get linux into OOM conditions
while there's plenty of space in CMA. But the kernel cannot use CMA for
it's own allocations, only to back user pages, which we don't care about
because our guest pages are covered by our hugetlb reserve :-)

> All the bits about pinning, numa, libvirt and whatnot don't really 
> matter and are just details that led Aneesh to find this non-optimal 
> allocation.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf May 6, 2014, 7:21 a.m. UTC | #8
On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>> Isn't this a greater problem? We should start swapping before we hit
>>>> the point where non movable kernel allocation fails, no?
>>> Possibly but the fact remains, this can be avoided by making sure that
>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>> the rest of main memory for hash tables.
>> So why were we preferring non-CMA memory before? Considering that Aneesh
>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
> I assume so.
>
>>>> The fact that KVM uses a good number of normal kernel pages is maybe
>>>> suboptimal, but shouldn't be a critical problem.
>>> The point is that we explicitly reserve those pages in CMA for use
>>> by KVM for that specific purpose, but the current code tries first
>>> to get them out of the normal pool.
>>>
>>> This is not an optimal behaviour and is what Aneesh patches are
>>> trying to fix.
>> I agree, and I agree that it's worth it to make better use of our
>> resources. But we still shouldn't crash.
> Well, Linux hitting out of memory conditions has never been a happy
> story :-)
>
>> However, reading through this thread I think I've slowly grasped what
>> the problem is. The hugetlbfs size calculation.
> Not really.
>
>> I guess something in your stack overreserves huge pages because it
>> doesn't account for the fact that some part of system memory is already
>> reserved for CMA.
> Either that or simply Linux runs out because we dirty too fast...
> really, Linux has never been good at dealing with OO situations,
> especially when things like network drivers and filesystems try to do
> ATOMIC or NOIO allocs...
>   
>> So the underlying problem is something completely orthogonal. The patch
>> body as is is fine, but the patch description should simply say that we
>> should prefer the CMA region because it's already reserved for us for
>> this purpose and we make better use of our available resources that way.
> No.
>
> We give a chunk of memory to hugetlbfs, it's all good and fine.
>
> Whatever remains is split between CMA and the normal page allocator.
>
> Without Aneesh latest patch, when creating guests, KVM starts allocating
> it's hash tables from the latter instead of CMA (we never allocate from
> hugetlb pool afaik, only guest pages do that, not hash tables).
>
> So we exhaust the page allocator and get linux into OOM conditions
> while there's plenty of space in CMA. But the kernel cannot use CMA for
> it's own allocations, only to back user pages, which we don't care about
> because our guest pages are covered by our hugetlb reserve :-)

Yes. Write that in the patch description and I'm happy ;).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V May 6, 2014, 2:20 p.m. UTC | #9
Alexander Graf <agraf@suse.de> writes:

> On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
>> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>>> Isn't this a greater problem? We should start swapping before we hit
>>>>> the point where non movable kernel allocation fails, no?
>>>> Possibly but the fact remains, this can be avoided by making sure that
>>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>>> the rest of main memory for hash tables.
>>> So why were we preferring non-CMA memory before? Considering that Aneesh
>>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
>> I assume so.

....
...

>>
>> Whatever remains is split between CMA and the normal page allocator.
>>
>> Without Aneesh latest patch, when creating guests, KVM starts allocating
>> it's hash tables from the latter instead of CMA (we never allocate from
>> hugetlb pool afaik, only guest pages do that, not hash tables).
>>
>> So we exhaust the page allocator and get linux into OOM conditions
>> while there's plenty of space in CMA. But the kernel cannot use CMA for
>> it's own allocations, only to back user pages, which we don't care about
>> because our guest pages are covered by our hugetlb reserve :-)
>
> Yes. Write that in the patch description and I'm happy ;).
>

How about the below:

Current KVM code first try to allocate hash page table from the normal
page allocator before falling back to the CMA reserve region. One of the
side effects of that is, we could exhaust the page allocator and get
linux into OOM conditions while we still have plenty of space in CMA. 

Fix this by trying the CMA reserve region first and then falling back
to normal page allocator if we fail to get enough memory from CMA
reserve area.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf May 6, 2014, 2:25 p.m. UTC | #10
On 05/06/2014 04:20 PM, Aneesh Kumar K.V wrote:
> Alexander Graf <agraf@suse.de> writes:
>
>> On 06.05.14 09:19, Benjamin Herrenschmidt wrote:
>>> On Tue, 2014-05-06 at 09:05 +0200, Alexander Graf wrote:
>>>> On 06.05.14 02:06, Benjamin Herrenschmidt wrote:
>>>>> On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
>>>>>> Isn't this a greater problem? We should start swapping before we hit
>>>>>> the point where non movable kernel allocation fails, no?
>>>>> Possibly but the fact remains, this can be avoided by making sure that
>>>>> if we create a CMA reserve for KVM, then it uses it rather than using
>>>>> the rest of main memory for hash tables.
>>>> So why were we preferring non-CMA memory before? Considering that Aneesh
>>>> introduced that logic in fa61a4e3 I suppose this was just a mistake?
>>> I assume so.
> ....
> ...
>
>>> Whatever remains is split between CMA and the normal page allocator.
>>>
>>> Without Aneesh latest patch, when creating guests, KVM starts allocating
>>> it's hash tables from the latter instead of CMA (we never allocate from
>>> hugetlb pool afaik, only guest pages do that, not hash tables).
>>>
>>> So we exhaust the page allocator and get linux into OOM conditions
>>> while there's plenty of space in CMA. But the kernel cannot use CMA for
>>> it's own allocations, only to back user pages, which we don't care about
>>> because our guest pages are covered by our hugetlb reserve :-)
>> Yes. Write that in the patch description and I'm happy ;).
>>
> How about the below:
>
> Current KVM code first try to allocate hash page table from the normal
> page allocator before falling back to the CMA reserve region. One of the
> side effects of that is, we could exhaust the page allocator and get
> linux into OOM conditions while we still have plenty of space in CMA.
>
> Fix this by trying the CMA reserve region first and then falling back
> to normal page allocator if we fail to get enough memory from CMA
> reserve area.

Fix the grammar (I've spotted a good number of mistakes), then this 
should do. Please also improve the headline.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb25ebc0af0c..f32896ffd784 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -52,7 +52,7 @@  static void kvmppc_rmap_reset(struct kvm *kvm);
 
 long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 {
-	unsigned long hpt;
+	unsigned long hpt = 0;
 	struct revmap_entry *rev;
 	struct page *page = NULL;
 	long order = KVM_DEFAULT_HPT_ORDER;
@@ -64,22 +64,11 @@  long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	}
 
 	kvm->arch.hpt_cma_alloc = 0;
-	/*
-	 * try first to allocate it from the kernel page allocator.
-	 * We keep the CMA reserved for failed allocation.
-	 */
-	hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
-			       __GFP_NOWARN, order - PAGE_SHIFT);
-
-	/* Next try to allocate from the preallocated pool */
-	if (!hpt) {
-		VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
-		page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
-		if (page) {
-			hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
-			kvm->arch.hpt_cma_alloc = 1;
-		} else
-			--order;
+	VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
+	page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
+	if (page) {
+		hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
+		kvm->arch.hpt_cma_alloc = 1;
 	}
 
 	/* Lastly try successively smaller sizes from the page allocator */