Message ID | CAKhg4tJjp3yymCTDFpCQJiekos3265AcuBMuCw5TkZUvjCvg1g@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VM boot failure on nodes not having DMA32 zone | expand |
On 24/07/2018 09:53, Liang C wrote: > Hi, > > We have a situation where our qemu processes need to be launched under > cgroup cpuset.mems control. This introduces an similar issue that was > discussed a few years ago. The difference here is that for our case, > not being able to allocate from DMA32 zone is a result a cgroup > restriction not mempolicy enforcement. Here is the steps to reproduce > the failure, > > mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone) > echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems > echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus > echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall > echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks > > #launch a virtual machine > kvm_init_vcpu failed: Cannot allocate memory > > There are workarounds, like always putting qemu processes onto the > node with DMA32 zone or not restricting qemu processes memory > allocation until that DMA32 alloc finishes (difficult to be precise). > But we would like to find a way to address the root cause. > > Considering the fact that the pae_root shadow should not be needed > when ept is in use, which is indeed our case - ept is always available > for us (guessing this is the same case for most of other users), we > made a patch roughly like this, > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index d594690..1d1b61e 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) > vcpu->arch.mmu.translate_gpa = translate_gpa; > vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; > > - return alloc_mmu_pages(vcpu); > + return tdp_enabled ? 0 : alloc_mmu_pages(vcpu); > } > > void kvm_mmu_setup(struct kvm_vcpu *vcpu) > > > It works through our test cases. But we would really like to have your > insight on this patch before applying it in production environment and > contributing it back to the community. Thanks in advance for any help > you may provide! Yes, this looks good. However, I'd place the "if" in alloc_mmu_pages itself. Thanks, Paolo
Thank you very much for the quick reply and confirmation! I just made and submitted a patch according to your advice. Thanks, Liang On Wed, Jul 25, 2018 at 2:05 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 24/07/2018 09:53, Liang C wrote: >> Hi, >> >> We have a situation where our qemu processes need to be launched under >> cgroup cpuset.mems control. This introduces an similar issue that was >> discussed a few years ago. The difference here is that for our case, >> not being able to allocate from DMA32 zone is a result a cgroup >> restriction not mempolicy enforcement. Here is the steps to reproduce >> the failure, >> >> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone) >> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems >> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus >> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall >> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks >> >> #launch a virtual machine >> kvm_init_vcpu failed: Cannot allocate memory >> >> There are workarounds, like always putting qemu processes onto the >> node with DMA32 zone or not restricting qemu processes memory >> allocation until that DMA32 alloc finishes (difficult to be precise). >> But we would like to find a way to address the root cause. >> >> Considering the fact that the pae_root shadow should not be needed >> when ept is in use, which is indeed our case - ept is always available >> for us (guessing this is the same case for most of other users), we >> made a patch roughly like this, >> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >> index d594690..1d1b61e 100644 >> --- a/arch/x86/kvm/mmu.c >> +++ b/arch/x86/kvm/mmu.c >> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) >> vcpu->arch.mmu.translate_gpa = translate_gpa; >> vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; >> >> - return alloc_mmu_pages(vcpu); >> + return tdp_enabled ? 0 : alloc_mmu_pages(vcpu); >> } >> >> void kvm_mmu_setup(struct kvm_vcpu *vcpu) >> >> >> It works through our test cases. But we would really like to have your >> insight on this patch before applying it in production environment and >> contributing it back to the community. Thanks in advance for any help >> you may provide! > > Yes, this looks good. However, I'd place the "if" in alloc_mmu_pages > itself. > > Thanks, > > Paolo
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d594690..1d1b61e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu.translate_gpa = translate_gpa; vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; - return alloc_mmu_pages(vcpu); + return tdp_enabled ? 0 : alloc_mmu_pages(vcpu); } void kvm_mmu_setup(struct kvm_vcpu *vcpu)