[1/2] KVM: MMU: Mark sp mmio cached when creating mmio spte

Message ID	20130312174440.5d5199ee.yoshikawa_takuya_b1@lab.ntt.co.jp (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> Date: Tue, 12 Mar 2013 17:44:40 +0900 From: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> To: mtosatti@redhat.com, gleb@redhat.com Cc: kvm@vger.kernel.org Subject: [PATCH 1/2] KVM: MMU: Mark sp mmio cached when creating mmio spte Message-Id: <20130312174440.5d5199ee.yoshikawa_takuya_b1@lab.ntt.co.jp> In-Reply-To: <20130312174333.7f76148e.yoshikawa_takuya_b1@lab.ntt.co.jp> References: <20130312174333.7f76148e.yoshikawa_takuya_b1@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk

Takuya Yoshikawa March 12, 2013, 8:44 a.m. UTC

This will be used not to zap unrelated mmu pages when creating/moving
a memory slot later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

Xiao Guangrong March 13, 2013, 5:06 a.m. UTC | #1

On 03/12/2013 04:44 PM, Takuya Yoshikawa wrote:
> This will be used not to zap unrelated mmu pages when creating/moving
> a memory slot later.

How about save all mmio spte into a mmio-rmap?

The good things are:
- instead walking all shadow page, we can only walk the rmap
- Comparing to zap a shadow page, it does not need to flush TLB after
  zapping mmio sptes

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Takuya Yoshikawa March 13, 2013, 7:28 a.m. UTC | #2

On Wed, 13 Mar 2013 13:06:23 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> On 03/12/2013 04:44 PM, Takuya Yoshikawa wrote:
> > This will be used not to zap unrelated mmu pages when creating/moving
> > a memory slot later.
> 
> How about save all mmio spte into a mmio-rmap?

The problem is that other mmu code would need to care about the pointers
stored in the new rmap list: when mmu_shrink zaps shadow pages for example.

Maybe worth thinking about, but I want to have a simple, back-portable patch
for distributors, as a first step: note that creating a memory slot can happen
many times for some guest configurations since QEMU is doing strange things
for re-mapping some regions IIRC.

> 
> The good things are:
> - instead walking all shadow page, we can only walk the rmap

Traversing the active list does not take such a long time compared to
other things to do for zapping pages: us, not ms order.  But I'm now
preparing for an additional work to avoid "goto restart" after deleting
entries.  That will at least help us not to traverse more than once.

> - Comparing to zap a shadow page, it does not need to flush TLB after
>   zapping mmio sptes

If we check each spte in the sp, we can achieve the similar goal:
similar to the old remove_write_access() code.  I implemented such
code but have not seen a clear improvement yet.  Pros and cons will
be there.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 13, 2013, 7:42 a.m. UTC | #3

On 03/13/2013 03:28 PM, Takuya Yoshikawa wrote:
> On Wed, 13 Mar 2013 13:06:23 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> On 03/12/2013 04:44 PM, Takuya Yoshikawa wrote:
>>> This will be used not to zap unrelated mmu pages when creating/moving
>>> a memory slot later.
>>
>> How about save all mmio spte into a mmio-rmap?
> 
> The problem is that other mmu code would need to care about the pointers
> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.

It is not hard... all the codes have been wrapped by *zap_spte*.

> 
> Maybe worth thinking about, but I want to have a simple, back-portable patch
> for distributors, as a first step: note that creating a memory slot can happen
> many times for some guest configurations since QEMU is doing strange things
> for re-mapping some regions IIRC.

Hmm, that means also need to delete memslot frequently, this patch can not
help much on deletion case.

> 
>>
>> The good things are:
>> - instead walking all shadow page, we can only walk the rmap
> 
> Traversing the active list does not take such a long time compared to
> other things to do for zapping pages: us, not ms order.  But I'm now

Walking shadow page depends on how much memory used on guest...

> preparing for an additional work to avoid "goto restart" after deleting
> entries.  That will at least help us not to traverse more than once.

If drop the walking, so you need not care "goto" stuff anymore...

> 
>> - Comparing to zap a shadow page, it does not need to flush TLB after
>>   zapping mmio sptes
> 
> If we check each spte in the sp, we can achieve the similar goal:
> similar to the old remove_write_access() code.  I implemented such
> code but have not seen a clear improvement yet.  Pros and cons will
> be there.

Checking every entries (512) in the shadow page is bad...

> 
> Thanks,
> 	Takuya
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov March 13, 2013, 12:33 p.m. UTC | #4

On Wed, Mar 13, 2013 at 03:42:18PM +0800, Xiao Guangrong wrote:
> On 03/13/2013 03:28 PM, Takuya Yoshikawa wrote:
> > On Wed, 13 Mar 2013 13:06:23 +0800
> > Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> > 
> >> On 03/12/2013 04:44 PM, Takuya Yoshikawa wrote:
> >>> This will be used not to zap unrelated mmu pages when creating/moving
> >>> a memory slot later.
> >>
> >> How about save all mmio spte into a mmio-rmap?
> > 
> > The problem is that other mmu code would need to care about the pointers
> > stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
> 
> It is not hard... all the codes have been wrapped by *zap_spte*.
> 
So are you going to send a patch? What do you think about applying this
as temporary solution?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 13, 2013, 12:42 p.m. UTC | #5

On 03/13/2013 08:33 PM, Gleb Natapov wrote:
> On Wed, Mar 13, 2013 at 03:42:18PM +0800, Xiao Guangrong wrote:
>> On 03/13/2013 03:28 PM, Takuya Yoshikawa wrote:
>>> On Wed, 13 Mar 2013 13:06:23 +0800
>>> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>>
>>>> On 03/12/2013 04:44 PM, Takuya Yoshikawa wrote:
>>>>> This will be used not to zap unrelated mmu pages when creating/moving
>>>>> a memory slot later.
>>>>
>>>> How about save all mmio spte into a mmio-rmap?
>>>
>>> The problem is that other mmu code would need to care about the pointers
>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
>>
>> It is not hard... all the codes have been wrapped by *zap_spte*.
>>
> So are you going to send a patch? What do you think about applying this
> as temporary solution?

Hi Gleb,

Since it only needs small change based on this patch, I think we can directly
apply the rmap-based way.

Takuya, could you please do this? ;)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Takuya Yoshikawa March 13, 2013, 1:40 p.m. UTC | #6

On Wed, 13 Mar 2013 20:42:41 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> >>>> How about save all mmio spte into a mmio-rmap?
> >>>
> >>> The problem is that other mmu code would need to care about the pointers
> >>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
> >>
> >> It is not hard... all the codes have been wrapped by *zap_spte*.
> >>
> > So are you going to send a patch? What do you think about applying this
> > as temporary solution?
> 
> Hi Gleb,
> 
> Since it only needs small change based on this patch, I think we can directly
> apply the rmap-based way.
> 
> Takuya, could you please do this? ;)

Though I'm fine with my making the patch better, I'm still thinking
about the bad side of it, though.

In zap_spte, don't we need to search the pointer to be removed from the
global mmio-rmap list?  How long can that list be?

Implementing it will/may not be difficult but I'm not sure if we would
get pure improvement.  Unless it becomes 99% sure, I think we should
first take a basic approach.

What do you think?

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 13, 2013, 2:05 p.m. UTC | #7

On 03/13/2013 09:40 PM, Takuya Yoshikawa wrote:
> On Wed, 13 Mar 2013 20:42:41 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>>>>>> How about save all mmio spte into a mmio-rmap?
>>>>>
>>>>> The problem is that other mmu code would need to care about the pointers
>>>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
>>>>
>>>> It is not hard... all the codes have been wrapped by *zap_spte*.
>>>>
>>> So are you going to send a patch? What do you think about applying this
>>> as temporary solution?
>>
>> Hi Gleb,
>>
>> Since it only needs small change based on this patch, I think we can directly
>> apply the rmap-based way.
>>
>> Takuya, could you please do this? ;)
> 
> Though I'm fine with my making the patch better, I'm still thinking
> about the bad side of it, though.
> 
> In zap_spte, don't we need to search the pointer to be removed from the
> global mmio-rmap list?  How long can that list be?

It is not bad. On softmmu, the rmap list has already been long more than 300.
On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).

The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
can be speed up after applying my previous patch:
KVM: MMU: fast drop all spte on the pte_list

> 
> Implementing it will/may not be difficult but I'm not sure if we would
> get pure improvement.  Unless it becomes 99% sure, I think we should
> first take a basic approach.

I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
pages. ;)

> 
> What do you think?

I am considering if zap all shadow page is faster enough (after my patchset), do
we really need to care it?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti March 14, 2013, 1:58 a.m. UTC | #8

On Wed, Mar 13, 2013 at 10:05:20PM +0800, Xiao Guangrong wrote:
> On 03/13/2013 09:40 PM, Takuya Yoshikawa wrote:
> > On Wed, 13 Mar 2013 20:42:41 +0800
> > Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> > 
> >>>>>> How about save all mmio spte into a mmio-rmap?
> >>>>>
> >>>>> The problem is that other mmu code would need to care about the pointers
> >>>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
> >>>>
> >>>> It is not hard... all the codes have been wrapped by *zap_spte*.
> >>>>
> >>> So are you going to send a patch? What do you think about applying this
> >>> as temporary solution?
> >>
> >> Hi Gleb,
> >>
> >> Since it only needs small change based on this patch, I think we can directly
> >> apply the rmap-based way.
> >>
> >> Takuya, could you please do this? ;)
> > 
> > Though I'm fine with my making the patch better, I'm still thinking
> > about the bad side of it, though.
> > 
> > In zap_spte, don't we need to search the pointer to be removed from the
> > global mmio-rmap list?  How long can that list be?
> 
> It is not bad. On softmmu, the rmap list has already been long more than 300.
> On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
> 
> The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
> can be speed up after applying my previous patch:
> KVM: MMU: fast drop all spte on the pte_list
> 
> > 
> > Implementing it will/may not be difficult but I'm not sure if we would
> > get pure improvement.  Unless it becomes 99% sure, I think we should
> > first take a basic approach. 
> 
> I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
> pages. ;)

With a huge number of shadow pages (think 512GB guest, 262144 pte-level
shadow pages to map), it might be a problem.

> > What do you think?
> 
> I am considering if zap all shadow page is faster enough (after my patchset), do
> we really need to care it?

Still needed: your patch reduces kvm_mmu_zap_all() time, but as you can
see with huge memory sized guests 100% improvement over the current
situation will be a bottleneck (and as you noted the deletion case is
still unsolved).

Suppose another improvement angle is to zap only whats necessary for the
given operation (say there is the memslot hint available, but unused for
x86).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Takuya Yoshikawa March 14, 2013, 2:26 a.m. UTC | #9

On Wed, 13 Mar 2013 22:58:21 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > > In zap_spte, don't we need to search the pointer to be removed from the
> > > global mmio-rmap list?  How long can that list be?
> > 
> > It is not bad. On softmmu, the rmap list has already been long more than 300.
> > On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).

mmu_shrink() is an exception.

> > 
> > The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
> > can be speed up after applying my previous patch:
> > KVM: MMU: fast drop all spte on the pte_list

My point is other code may need to care more about latency.

Zapping all mmio sptes can happen only when changing memory regions:
not so latency severe but should be reasonably fast not to hold
mmu_lock for a (too) long time.

Compared to that, mmu_shrink() may be called any time and adding
more work to it should be avoided IMO.  It should return ASAP.

In general, we should try hard to keep ourselves from affecting
unrelated code path for optimizing something.  The global pte
list is something which can affect many code paths in the future.

So, I'm fine with trying mmio-rmap once we can actually measure
very long mmu_lock hold time by traversing shadow pages.

How about applying this first and then see the effect on big guests?

Thanks,
	Takuya

> > > Implementing it will/may not be difficult but I'm not sure if we would
> > > get pure improvement.  Unless it becomes 99% sure, I think we should
> > > first take a basic approach. 
> > 
> > I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
> > pages. ;)
> 
> With a huge number of shadow pages (think 512GB guest, 262144 pte-level
> shadow pages to map), it might be a problem.
> 
> > > What do you think?
> > 
> > I am considering if zap all shadow page is faster enough (after my patchset), do
> > we really need to care it?
> 
> Still needed: your patch reduces kvm_mmu_zap_all() time, but as you can
> see with huge memory sized guests 100% improvement over the current
> situation will be a bottleneck (and as you noted the deletion case is
> still unsolved).
> 
> Suppose another improvement angle is to zap only whats necessary for the
> given operation (say there is the memslot hint available, but unused for
> x86).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti March 14, 2013, 2:39 a.m. UTC | #10

On Thu, Mar 14, 2013 at 11:26:41AM +0900, Takuya Yoshikawa wrote:
> On Wed, 13 Mar 2013 22:58:21 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > > > In zap_spte, don't we need to search the pointer to be removed from the
> > > > global mmio-rmap list?  How long can that list be?
> > > 
> > > It is not bad. On softmmu, the rmap list has already been long more than 300.
> > > On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
> 
> mmu_shrink() is an exception.
> 
> > > 
> > > The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
> > > can be speed up after applying my previous patch:
> > > KVM: MMU: fast drop all spte on the pte_list
> 
> My point is other code may need to care more about latency.
> 
> Zapping all mmio sptes can happen only when changing memory regions:
> not so latency severe but should be reasonably fast not to hold
> mmu_lock for a (too) long time.
> 
> Compared to that, mmu_shrink() may be called any time and adding
> more work to it should be avoided IMO.  It should return ASAP.

Good point.

> In general, we should try hard to keep ourselves from affecting
> unrelated code path for optimizing something.  The global pte
> list is something which can affect many code paths in the future.
> 
> 
> So, I'm fine with trying mmio-rmap once we can actually measure
> very long mmu_lock hold time by traversing shadow pages.
> 
> How about applying this first and then see the effect on big guests?

Works for me. Xiao?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 14, 2013, 5:13 a.m. UTC | #11

On 03/14/2013 09:58 AM, Marcelo Tosatti wrote:
> On Wed, Mar 13, 2013 at 10:05:20PM +0800, Xiao Guangrong wrote:
>> On 03/13/2013 09:40 PM, Takuya Yoshikawa wrote:
>>> On Wed, 13 Mar 2013 20:42:41 +0800
>>> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>>
>>>>>>>> How about save all mmio spte into a mmio-rmap?
>>>>>>>
>>>>>>> The problem is that other mmu code would need to care about the pointers
>>>>>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
>>>>>>
>>>>>> It is not hard... all the codes have been wrapped by *zap_spte*.
>>>>>>
>>>>> So are you going to send a patch? What do you think about applying this
>>>>> as temporary solution?
>>>>
>>>> Hi Gleb,
>>>>
>>>> Since it only needs small change based on this patch, I think we can directly
>>>> apply the rmap-based way.
>>>>
>>>> Takuya, could you please do this? ;)
>>>
>>> Though I'm fine with my making the patch better, I'm still thinking
>>> about the bad side of it, though.
>>>
>>> In zap_spte, don't we need to search the pointer to be removed from the
>>> global mmio-rmap list?  How long can that list be?
>>
>> It is not bad. On softmmu, the rmap list has already been long more than 300.
>> On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
>>
>> The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
>> can be speed up after applying my previous patch:
>> KVM: MMU: fast drop all spte on the pte_list
>>
>>>
>>> Implementing it will/may not be difficult but I'm not sure if we would
>>> get pure improvement.  Unless it becomes 99% sure, I think we should
>>> first take a basic approach. 
>>
>> I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
>> pages. ;)
> 
> With a huge number of shadow pages (think 512GB guest, 262144 pte-level
> shadow pages to map), it might be a problem.

That is one of the reasons why i think zap mmio shadow page is not good. ;)

This patch needs to walk all shadow pages to find all mmio shadow page out
and zap them, it depends on how much memory is used on guest (huge memory
causes huge shadow page as you said). But the time of zapping mmio spte is
constant, no matter of memory used.

> 
>>> What do you think?
>>
>> I am considering if zap all shadow page is faster enough (after my patchset), do
>> we really need to care it?
> 
> Still needed: your patch reduces kvm_mmu_zap_all() time, but as you can
> see with huge memory sized guests 100% improvement over the current
> situation will be a bottleneck (and as you noted the deletion case is
> still unsolved).	

The improvement can be greater if more memory is used. (I only used 2G memory in
guest since my test case is 32bit program which can not use huge memory, and
not lock contention in my testcase.)

Actually, the time complexity of current kvm_mmu_zap_all is the same as zap
mmio shadow page in the mmu-lock (O(n), n is the number of shadow page table).
Both of them walking all shadow page table.  The reset work of kvm_mmu_zap is
constant.

And this is a TODO thing:
(2): free shadow pages by using generation-number
After that, kvm_mmu_zap needn't to walking all shadow pages anymore.

> 
> Suppose another improvement angle is to zap only whats necessary for the
> given operation (say there is the memslot hint available, but unused for
> x86).

Yes, i agree on this point. Zapping all shadow pages smake vcpus fault
on all memory access. This is the shortage.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 14, 2013, 5:36 a.m. UTC | #12

On 03/14/2013 10:39 AM, Marcelo Tosatti wrote:
> On Thu, Mar 14, 2013 at 11:26:41AM +0900, Takuya Yoshikawa wrote:
>> On Wed, 13 Mar 2013 22:58:21 -0300
>> Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>
>>>>> In zap_spte, don't we need to search the pointer to be removed from the
>>>>> global mmio-rmap list?  How long can that list be?
>>>>
>>>> It is not bad. On softmmu, the rmap list has already been long more than 300.
>>>> On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
>>
>> mmu_shrink() is an exception.
>>
>>>>
>>>> The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
>>>> can be speed up after applying my previous patch:
>>>> KVM: MMU: fast drop all spte on the pte_list
>>
>> My point is other code may need to care more about latency.
>>
>> Zapping all mmio sptes can happen only when changing memory regions:
>> not so latency severe but should be reasonably fast not to hold
>> mmu_lock for a (too) long time.
>>
>> Compared to that, mmu_shrink() may be called any time and adding
>> more work to it should be avoided IMO.  It should return ASAP.

Hmm? How frequently is of mmu_shrink? Well, it would be heavy sometimes, but
is not the case on normal running.
How many mmio shdow pages we got in the system? Not many, especially on the
virtio supported guest.

And, if it is a real problem, it is worthwhile to optimize it since it is
more worse for normal page rmap on shadow mmu.

I have a idea to avoid holding mmu-lock that i mentioned in the previous mail
that is cache generation-number into mmio spte. When zap mmio spte is needed,
we can just simply increase the global generation-number.

> 
> Good point.
> 
>> In general, we should try hard to keep ourselves from affecting
>> unrelated code path for optimizing something.  The global pte
>> list is something which can affect many code paths in the future.
>>
>>
>> So, I'm fine with trying mmio-rmap once we can actually measure
>> very long mmu_lock hold time by traversing shadow pages.
>>
>> How about applying this first and then see the effect on big guests?
> 
> Works for me. Xiao?

Marcelo, I do not persist in it. ;)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xiao Guangrong March 14, 2013, 5:45 a.m. UTC | #13

On 03/14/2013 01:13 PM, Xiao Guangrong wrote:
> On 03/14/2013 09:58 AM, Marcelo Tosatti wrote:
>> On Wed, Mar 13, 2013 at 10:05:20PM +0800, Xiao Guangrong wrote:
>>> On 03/13/2013 09:40 PM, Takuya Yoshikawa wrote:
>>>> On Wed, 13 Mar 2013 20:42:41 +0800
>>>> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
>>>>
>>>>>>>>> How about save all mmio spte into a mmio-rmap?
>>>>>>>>
>>>>>>>> The problem is that other mmu code would need to care about the pointers
>>>>>>>> stored in the new rmap list: when mmu_shrink zaps shadow pages for example.
>>>>>>>
>>>>>>> It is not hard... all the codes have been wrapped by *zap_spte*.
>>>>>>>
>>>>>> So are you going to send a patch? What do you think about applying this
>>>>>> as temporary solution?
>>>>>
>>>>> Hi Gleb,
>>>>>
>>>>> Since it only needs small change based on this patch, I think we can directly
>>>>> apply the rmap-based way.
>>>>>
>>>>> Takuya, could you please do this? ;)
>>>>
>>>> Though I'm fine with my making the patch better, I'm still thinking
>>>> about the bad side of it, though.
>>>>
>>>> In zap_spte, don't we need to search the pointer to be removed from the
>>>> global mmio-rmap list?  How long can that list be?
>>>
>>> It is not bad. On softmmu, the rmap list has already been long more than 300.
>>> On hardmmu, normally the mmio spte is not frequently zapped (just set not clear).
>>>
>>> The worst case is zap-all-mmio-spte that removes all mmio-spte. This operation
>>> can be speed up after applying my previous patch:
>>> KVM: MMU: fast drop all spte on the pte_list
>>>
>>>>
>>>> Implementing it will/may not be difficult but I'm not sure if we would
>>>> get pure improvement.  Unless it becomes 99% sure, I think we should
>>>> first take a basic approach. 
>>>
>>> I definitely sure zapping all mmio-sptes is fast than zapping mmio shadow
>>> pages. ;)
>>
>> With a huge number of shadow pages (think 512GB guest, 262144 pte-level
>> shadow pages to map), it might be a problem.
> 
> That is one of the reasons why i think zap mmio shadow page is not good. ;)
> 
> This patch needs to walk all shadow pages to find all mmio shadow page out
> and zap them, it depends on how much memory is used on guest (huge memory
> causes huge shadow page as you said). But the time of zapping mmio spte is
> constant, no matter of memory used.
> 
>>
>>>> What do you think?
>>>
>>> I am considering if zap all shadow page is faster enough (after my patchset), do
>>> we really need to care it?
>>
>> Still needed: your patch reduces kvm_mmu_zap_all() time, but as you can
>> see with huge memory sized guests 100% improvement over the current
>> situation will be a bottleneck (and as you noted the deletion case is
>> still unsolved).	
> 
> The improvement can be greater if more memory is used. (I only used 2G memory in
> guest since my test case is 32bit program which can not use huge memory, and
> not lock contention in my testcase.)
> 
> Actually, the time complexity of current kvm_mmu_zap_all is the same as zap

                                    ^^^^^
Sorry, not current way. It is the optimizing way in my patchset.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Takuya Yoshikawa March 16, 2013, 2:01 a.m. UTC | #14

[ I'm still reading your patches, so please forgive me If I'm wrong. ]

On Thu, 14 Mar 2013 13:13:30 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> Actually, the time complexity of current kvm_mmu_zap_all is the same as zap
> mmio shadow page in the mmu-lock (O(n), n is the number of shadow page table).
> Both of them walking all shadow page table.  The reset work of kvm_mmu_zap is
> constant.

Clearing rmap arrays, using memset, cannot be constant.
It's proportional to the number of guest pages (not shadow pages).
I guess we can think it's practically constant for all cases,
so I think your optimization is great!

But anyway it's worth remembering the arrays can be very long.
512GB: 128M pages.  Clearing 1GB of memory will not take too long(?)...
So my guess is that your method can cover most of the use cases we can
think of now.

Thanks,
	Takuya

> 
> And this is a TODO thing:
> (2): free shadow pages by using generation-number
> After that, kvm_mmu_zap needn't to walking all shadow pages anymore.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/2] KVM: MMU: Mark sp mmio cached when creating mmio spte

Commit Message

Comments

Patch