diff mbox

[2/2] kvm: powerpc: set cache coherency only for kernel managed pages

Message ID 51EF3B67.5060403@windriver.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tiejun Chen July 24, 2013, 2:26 a.m. UTC
On 07/18/2013 06:27 PM, Alexander Graf wrote:
>
> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>
>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>
>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>
>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>
>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Bhushan Bharat-R65777
>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>> To: '"?tiejun.chen?"'
>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>>> B07421
>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>>> managed pages
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "?tiejun.chen?" [mailto:tiejun.chen@windriver.com]
>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>>> Scott-
>>>>>>>> B07421
>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>> kernel managed pages
>>>>>>>>
>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "?tiejun.chen?"
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>> Wood
>>>>>>>>>> Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>>> kernel managed pages
>>>>>>>>>>
>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: "?tiejun.chen?" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>>> Wood
>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>>
>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>>> else this is treated as I/O and we set  "I + G"  (cache
>>>>>>>>>>>>> inhibited,
>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>
>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>     arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----
>>>>>>>>>>>>>     1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>     	return mas3;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>>>     {
>>>>>>>>>>>>> +	u32 mas2_attr;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +	mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
>>>>>>>>>>>>
>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>
>>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>>> that it
>>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>>> is DDR.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>>> so,
>>>>>>>>>>
>>>>>>>>>>       KVM: direct mmio pfn check
>>>>>>>>>>
>>>>>>>>>>       Userspace may specify memory slots that are backed by mmio
>>>>>>>>>> pages rather than
>>>>>>>>>>       normal RAM.  In some cases it is not enough to identify these
>>>>>>>>>> mmio
>>>>>>>> pages
>>>>>>>>>>       by pfn_valid().  This patch adds checking the PageReserved as well.
>>>>>>>>>
>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>> PageReserved helps in
>>>>>>>> those cases?
>>>>>>>>
>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>>> be chronically persistent as I understand ;-)
>>>>>>>
>>>>>>> Then I will wait till someone educate me :)
>>>>>>
>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>>
>>>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>
>>>>>    1) Non cache coherent DMA
>>>>>    2) Memory hot remove
>>>>>
>>>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>>>
>>>>>          depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>>          default n if PPC_47x
>>>>>          default y
>>>>>
>>>>> so we never hit it with any core we care about ;).
>>>>>
>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>>>
>>>> Thanks for this good information :)
>>>>
>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>>>
>>>> If I'm wrong please correct me :)
>>>
>>> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>>>
>>> I'd rather not like to break x86 :).
>>>
>>> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>>>
>>
>> Often what case should be adopted to validate this scenario?
>
> Something which hammers the TLB emulation heavily. I usually just run /bin/echo a thousand times in "time" and see how long it takes ;)
>

I tried to run five times with this combination, "time `for ((i=0; i<5000; 
i++));  do /bin/echo; done`", to calculate the average value with this change:


Before apply this change:

real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s

After apply this change:

real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s

So,

real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%

Note I only boot one VM.

Tiejun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Graf July 24, 2013, 8:25 a.m. UTC | #1
On 24.07.2013, at 04:26, “tiejun.chen” wrote:

> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>> 
>> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>> 
>>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>> 
>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>> 
>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>> To: '"?tiejun.chen?"'
>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood Scott-
>>>>>>>> B07421
>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>>>>>>>> managed pages
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "?tiejun.chen?" [mailto:tiejun.chen@windriver.com]
>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de; Wood
>>>>>>>>> Scott-
>>>>>>>>> B07421
>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>> kernel managed pages
>>>>>>>>> 
>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of "?tiejun.chen?"
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>> Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
>>>>>>>>>>> kernel managed pages
>>>>>>>>>>> 
>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: "?tiejun.chen?" [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org; agraf@suse.de;
>>>>>>>>>>>>> Wood
>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>>>>> for kernel managed pages
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>> If there is a struct page for the requested mapping then it's
>>>>>>>>>>>>>> normal DDR and the mapping sets "M" bit (coherent, cacheable)
>>>>>>>>>>>>>> else this is treated as I/O and we set  "I + G"  (cache
>>>>>>>>>>>>>> inhibited,
>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned device
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>    arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----
>>>>>>>>>>>>>>    1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>    	return mas3;
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
>>>>>>>>>>>>>>    {
>>>>>>>>>>>>>> +	u32 mas2_attr;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +	mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>> 
>>>>>>>>>>>> What I understand from this function (someone can correct me) is
>>>>>>>>>>>> that it
>>>>>>>>>>> returns "false" when the page is managed by kernel and is not
>>>>>>>>>>> marked as RESERVED (for some reason). For us it does not matter
>>>>>>>>>>> whether the page is reserved or not, if it is kernel visible page then it
>>>>>>>> is DDR.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages, right? If
>>>>>>>>>>> so,
>>>>>>>>>>> 
>>>>>>>>>>>      KVM: direct mmio pfn check
>>>>>>>>>>> 
>>>>>>>>>>>      Userspace may specify memory slots that are backed by mmio
>>>>>>>>>>> pages rather than
>>>>>>>>>>>      normal RAM.  In some cases it is not enough to identify these
>>>>>>>>>>> mmio
>>>>>>>>> pages
>>>>>>>>>>>      by pfn_valid().  This patch adds checking the PageReserved as well.
>>>>>>>>>> 
>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>> PageReserved helps in
>>>>>>>>> those cases?
>>>>>>>>> 
>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this should
>>>>>>>>> be chronically persistent as I understand ;-)
>>>>>>>> 
>>>>>>>> Then I will wait till someone educate me :)
>>>>>>> 
>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary.
>>>>>> 
>>>>>> It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>> 
>>>>>>   1) Non cache coherent DMA
>>>>>>   2) Memory hot remove
>>>>>> 
>>>>>> The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by:
>>>>>> 
>>>>>>         depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>>>         default n if PPC_47x
>>>>>>         default y
>>>>>> 
>>>>>> so we never hit it with any core we care about ;).
>>>>>> 
>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either.
>>>>> 
>>>>> Thanks for this good information :)
>>>>> 
>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss.
>>>>> 
>>>>> If I'm wrong please correct me :)
>>>> 
>>>> You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us? In that case we potentially break x86.
>>>> 
>>>> I'd rather not like to break x86 :).
>>>> 
>>>> However, it'd be very interesting to see a benchmark with this change. Do you think you could just rip out the whole reserved check and run a few benchmarks and show us the results?
>>>> 
>>> 
>>> Often what case should be adopted to validate this scenario?
>> 
>> Something which hammers the TLB emulation heavily. I usually just run /bin/echo a thousand times in "time" and see how long it takes ;)
>> 
> 
> I tried to run five times with this combination, "time `for ((i=0; i<5000; i++));  do /bin/echo; done`", to calculate the average value with this change:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 1580dd4..5e8635b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> 
> bool kvm_is_mmio_pfn(pfn_t pfn)
> {
> +#ifdef CONFIG_MEMORY_HOTPLUG

I'd feel safer if we narrow this down to e500.

> +       /*
> +        * Currently only in memory hot remove case we may still need this.
> +        */
>        if (pfn_valid(pfn)) {

We still have to check for pfn_valid, no? So the #ifdef should be down here.

>                int reserved;
>                struct page *tail = pfn_to_page(pfn);
> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>                }
>                return PageReserved(tail);
>        }
> +#endif
> 
>        return true;
> }
> 
> Before apply this change:
> 
> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> 
> After apply this change:
> 
> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> 
> So,
> 
> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%

Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.

Gleb, Paolo, any hard feelings?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bharat Bhushan July 24, 2013, 9:11 a.m. UTC | #2
> -----Original Message-----

> From: Alexander Graf [mailto:agraf@suse.de]

> Sent: Wednesday, July 24, 2013 1:55 PM

> To: "“tiejun.chen”"

> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org list;

> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini

> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel

> managed pages

> 

> 

> On 24.07.2013, at 04:26, “tiejun.chen” wrote:

> 

> > On 07/18/2013 06:27 PM, Alexander Graf wrote:

> >>

> >> On 18.07.2013, at 12:19, “tiejun.chen” wrote:

> >>

> >>> On 07/18/2013 06:12 PM, Alexander Graf wrote:

> >>>>

> >>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:

> >>>>

> >>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:

> >>>>>>

> >>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:

> >>>>>>

> >>>>>>>

> >>>>>>>

> >>>>>>>> -----Original Message-----

> >>>>>>>> From: Bhushan Bharat-R65777

> >>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM

> >>>>>>>> To: '" tiejun.chen "'

> >>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;

> >>>>>>>> agraf@suse.de; Wood Scott-

> >>>>>>>> B07421

> >>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only

> >>>>>>>> for kernel managed pages

> >>>>>>>>

> >>>>>>>>

> >>>>>>>>

> >>>>>>>>> -----Original Message-----

> >>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]

> >>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM

> >>>>>>>>> To: Bhushan Bharat-R65777

> >>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;

> >>>>>>>>> agraf@suse.de; Wood

> >>>>>>>>> Scott-

> >>>>>>>>> B07421

> >>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency

> >>>>>>>>> only for kernel managed pages

> >>>>>>>>>

> >>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:

> >>>>>>>>>>

> >>>>>>>>>>

> >>>>>>>>>>> -----Original Message-----

> >>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org

> >>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.chen "

> >>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM

> >>>>>>>>>>> To: Bhushan Bharat-R65777

> >>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;

> >>>>>>>>>>> agraf@suse.de; Wood

> >>>>>>>>>>> Scott-

> >>>>>>>>>>> B07421

> >>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency

> >>>>>>>>>>> only for kernel managed pages

> >>>>>>>>>>>

> >>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:

> >>>>>>>>>>>>

> >>>>>>>>>>>>

> >>>>>>>>>>>>> -----Original Message-----

> >>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]

> >>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM

> >>>>>>>>>>>>> To: Bhushan Bharat-R65777

> >>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;

> >>>>>>>>>>>>> agraf@suse.de; Wood

> >>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777

> >>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency

> >>>>>>>>>>>>> only for kernel managed pages

> >>>>>>>>>>>>>

> >>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:

> >>>>>>>>>>>>>> If there is a struct page for the requested mapping then

> >>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,

> >>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set  "I +

> >>>>>>>>>>>>>> G"  (cache inhibited,

> >>>>>>>>>>>>>> guarded)

> >>>>>>>>>>>>>>

> >>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned

> >>>>>>>>>>>>>> device

> >>>>>>>>>>>>>>

> >>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan

> >>>>>>>>>>>>>> <bharat.bhushan@freescale.com>

> >>>>>>>>>>>>>> ---

> >>>>>>>>>>>>>>    arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----

> >>>>>>>>>>>>>>    1 files changed, 12 insertions(+), 5 deletions(-)

> >>>>>>>>>>>>>>

> >>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c

> >>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c

> >>>>>>>>>>>>>> index 1c6a9d7..089c227 100644

> >>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c

> >>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c

> >>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32

> >>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int

> >>>>>>>>>>>>> usermode)

> >>>>>>>>>>>>>>    	return mas3;

> >>>>>>>>>>>>>>    }

> >>>>>>>>>>>>>>

> >>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int

> >>>>>>>>>>>>>> usermode)

> >>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,

> >>>>>>>>>>>>>> +pfn_t pfn)

> >>>>>>>>>>>>>>    {

> >>>>>>>>>>>>>> +	u32 mas2_attr;

> >>>>>>>>>>>>>> +

> >>>>>>>>>>>>>> +	mas2_attr = mas2 & MAS2_ATTRIB_MASK;

> >>>>>>>>>>>>>> +

> >>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {

> >>>>>>>>>>>>>

> >>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?

> >>>>>>>>>>>>

> >>>>>>>>>>>> What I understand from this function (someone can correct

> >>>>>>>>>>>> me) is that it

> >>>>>>>>>>> returns "false" when the page is managed by kernel and is

> >>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not

> >>>>>>>>>>> matter whether the page is reserved or not, if it is kernel

> >>>>>>>>>>> visible page then it

> >>>>>>>> is DDR.

> >>>>>>>>>>>>

> >>>>>>>>>>>

> >>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,

> >>>>>>>>>>> right? If so,

> >>>>>>>>>>>

> >>>>>>>>>>>      KVM: direct mmio pfn check

> >>>>>>>>>>>

> >>>>>>>>>>>      Userspace may specify memory slots that are backed by

> >>>>>>>>>>> mmio pages rather than

> >>>>>>>>>>>      normal RAM.  In some cases it is not enough to identify

> >>>>>>>>>>> these mmio

> >>>>>>>>> pages

> >>>>>>>>>>>      by pfn_valid().  This patch adds checking the PageReserved as

> well.

> >>>>>>>>>>

> >>>>>>>>>> Do you know what are those "some cases" and how checking

> >>>>>>>>>> PageReserved helps in

> >>>>>>>>> those cases?

> >>>>>>>>>

> >>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this

> >>>>>>>>> should be chronically persistent as I understand ;-)

> >>>>>>>>

> >>>>>>>> Then I will wait till someone educate me :)

> >>>>>>>

> >>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do

> not want to call this for all tlbwe operation unless it is necessary.

> >>>>>>

> >>>>>> It certainly does more than we need and potentially slows down the fast

> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to

> check for pages that are declared reserved on the host. This happens in 2 cases:

> >>>>>>

> >>>>>>   1) Non cache coherent DMA

> >>>>>>   2) Memory hot remove

> >>>>>>

> >>>>>> The non coherent DMA case would be interesting, as with the mechanism as

> it is in place in Linux today, we could potentially break normal guest operation

> if we don't take it into account. However, it's Kconfig guarded by:

> >>>>>>

> >>>>>>         depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON

> >>>>>>         default n if PPC_47x

> >>>>>>         default y

> >>>>>>

> >>>>>> so we never hit it with any core we care about ;).

> >>>>>>

> >>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry

> about that one either.

> >>>>>

> >>>>> Thanks for this good information :)

> >>>>>

> >>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside

> kvm_is_mmio_pfn() to make sure that check is only valid when that is really

> needed? This can decrease those unnecessary performance loss.

> >>>>>

> >>>>> If I'm wrong please correct me :)

> >>>>

> >>>> You're perfectly right, but this is generic KVM code. So it gets run across

> all architectures. What if someone has the great idea to add a new case here for

> x86, but doesn't tell us? In that case we potentially break x86.

> >>>>

> >>>> I'd rather not like to break x86 :).

> >>>>

> >>>> However, it'd be very interesting to see a benchmark with this change. Do

> you think you could just rip out the whole reserved check and run a few

> benchmarks and show us the results?

> >>>>

> >>>

> >>> Often what case should be adopted to validate this scenario?

> >>

> >> Something which hammers the TLB emulation heavily. I usually just run

> >> /bin/echo a thousand times in "time" and see how long it takes ;)

> >>

> >

> > I tried to run five times with this combination, "time `for ((i=0; i<5000;

> i++));  do /bin/echo; done`", to calculate the average value with this change:

> >

> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index

> > 1580dd4..5e8635b 100644

> > --- a/virt/kvm/kvm_main.c

> > +++ b/virt/kvm/kvm_main.c

> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;

> >

> > bool kvm_is_mmio_pfn(pfn_t pfn)

> > {

> > +#ifdef CONFIG_MEMORY_HOTPLUG

> 

> I'd feel safer if we narrow this down to e500.

> 

> > +       /*

> > +        * Currently only in memory hot remove case we may still need this.

> > +        */

> >        if (pfn_valid(pfn)) {

> 

> We still have to check for pfn_valid, no? So the #ifdef should be down here.

> 

> >                int reserved;

> >                struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7

> > @@ bool kvm_is_mmio_pfn(pfn_t pfn)

> >                }

> >                return PageReserved(tail);

> >        }

> > +#endif

> >

> >        return true;

> > }

> >

> > Before apply this change:

> >

> > real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=

> 1m21.376s

> > user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=

> 0m23.433s

> > sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s

> >

> > After apply this change:

> >

> > real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=

> 1m20.667s

> > user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=

> 0m22.615s

> > sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s

> >

> > So,

> >

> > real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%

> > user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%

> > sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%

> 

> Very nice, so there is a real world performance benefit to doing this. Then yes,

> I think it would make sense to change the global helper function to be fast on

> e500 and use that one from e500_shadow_mas2_attrib() instead.


Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as Scott commented?

-Bharat

> 

> Gleb, Paolo, any hard feelings?

> 

> 

> Alex

>
Alexander Graf July 24, 2013, 9:21 a.m. UTC | #3
On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, July 24, 2013 1:55 PM
>> To: "“tiejun.chen”"
>> Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; kvm@vger.kernel.org list;
>> Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
>> managed pages
>> 
>> 
>> On 24.07.2013, at 04:26, “tiejun.chen” wrote:
>> 
>>> On 07/18/2013 06:27 PM, Alexander Graf wrote:
>>>> 
>>>> On 18.07.2013, at 12:19, “tiejun.chen” wrote:
>>>> 
>>>>> On 07/18/2013 06:12 PM, Alexander Graf wrote:
>>>>>> 
>>>>>> On 18.07.2013, at 12:08, “tiejun.chen” wrote:
>>>>>> 
>>>>>>> On 07/18/2013 05:48 PM, Alexander Graf wrote:
>>>>>>>> 
>>>>>>>> On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Bhushan Bharat-R65777
>>>>>>>>>> Sent: Thursday, July 18, 2013 1:53 PM
>>>>>>>>>> To: '" tiejun.chen "'
>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>> agraf@suse.de; Wood Scott-
>>>>>>>>>> B07421
>>>>>>>>>> Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
>>>>>>>>>> for kernel managed pages
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:52 PM
>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>> Scott-
>>>>>>>>>>> B07421
>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>> 
>>>>>>>>>>> On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: kvm-ppc-owner@vger.kernel.org
>>>>>>>>>>>>> [mailto:kvm-ppc-owner@vger.kernel.org] On Behalf Of " tiejun.chen "
>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 1:01 PM
>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>> Scott-
>>>>>>>>>>>>> B07421
>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: " tiejun.chen " [mailto:tiejun.chen@windriver.com]
>>>>>>>>>>>>>>> Sent: Thursday, July 18, 2013 11:56 AM
>>>>>>>>>>>>>>> To: Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Cc: kvm-ppc@vger.kernel.org; kvm@vger.kernel.org;
>>>>>>>>>>>>>>> agraf@suse.de; Wood
>>>>>>>>>>>>>>> Scott- B07421; Bhushan Bharat-R65777
>>>>>>>>>>>>>>> Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
>>>>>>>>>>>>>>> only for kernel managed pages
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
>>>>>>>>>>>>>>>> If there is a struct page for the requested mapping then
>>>>>>>>>>>>>>>> it's normal DDR and the mapping sets "M" bit (coherent,
>>>>>>>>>>>>>>>> cacheable) else this is treated as I/O and we set  "I +
>>>>>>>>>>>>>>>> G"  (cache inhibited,
>>>>>>>>>>>>>>>> guarded)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This helps setting proper TLB mapping for direct assigned
>>>>>>>>>>>>>>>> device
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Signed-off-by: Bharat Bhushan
>>>>>>>>>>>>>>>> <bharat.bhushan@freescale.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>   arch/powerpc/kvm/e500_mmu_host.c |   17 ++++++++++++-----
>>>>>>>>>>>>>>>>   1 files changed, 12 insertions(+), 5 deletions(-)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> diff --git a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> index 1c6a9d7..089c227 100644
>>>>>>>>>>>>>>>> --- a/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> +++ b/arch/powerpc/kvm/e500_mmu_host.c
>>>>>>>>>>>>>>>> @@ -64,13 +64,20 @@ static inline u32
>>>>>>>>>>>>>>>> e500_shadow_mas3_attrib(u32 mas3, int
>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>>   	return mas3;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
>>>>>>>>>>>>>>>> usermode)
>>>>>>>>>>>>>>>> +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
>>>>>>>>>>>>>>>> +pfn_t pfn)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>> +	u32 mas2_attr;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	mas2_attr = mas2 & MAS2_ATTRIB_MASK;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +	if (!pfn_valid(pfn)) {
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Why not directly use kvm_is_mmio_pfn()?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What I understand from this function (someone can correct
>>>>>>>>>>>>>> me) is that it
>>>>>>>>>>>>> returns "false" when the page is managed by kernel and is
>>>>>>>>>>>>> not marked as RESERVED (for some reason). For us it does not
>>>>>>>>>>>>> matter whether the page is reserved or not, if it is kernel
>>>>>>>>>>>>> visible page then it
>>>>>>>>>> is DDR.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think you are setting I|G by addressing all mmio pages,
>>>>>>>>>>>>> right? If so,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     KVM: direct mmio pfn check
>>>>>>>>>>>>> 
>>>>>>>>>>>>>     Userspace may specify memory slots that are backed by
>>>>>>>>>>>>> mmio pages rather than
>>>>>>>>>>>>>     normal RAM.  In some cases it is not enough to identify
>>>>>>>>>>>>> these mmio
>>>>>>>>>>> pages
>>>>>>>>>>>>>     by pfn_valid().  This patch adds checking the PageReserved as
>> well.
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you know what are those "some cases" and how checking
>>>>>>>>>>>> PageReserved helps in
>>>>>>>>>>> those cases?
>>>>>>>>>>> 
>>>>>>>>>>> No, myself didn't see these actual cases in qemu,too. But this
>>>>>>>>>>> should be chronically persistent as I understand ;-)
>>>>>>>>>> 
>>>>>>>>>> Then I will wait till someone educate me :)
>>>>>>>>> 
>>>>>>>>> The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
>> not want to call this for all tlbwe operation unless it is necessary.
>>>>>>>> 
>>>>>>>> It certainly does more than we need and potentially slows down the fast
>> path (RAM mapping). The only thing it does on top of "if (pfn_valid())" is to
>> check for pages that are declared reserved on the host. This happens in 2 cases:
>>>>>>>> 
>>>>>>>>  1) Non cache coherent DMA
>>>>>>>>  2) Memory hot remove
>>>>>>>> 
>>>>>>>> The non coherent DMA case would be interesting, as with the mechanism as
>> it is in place in Linux today, we could potentially break normal guest operation
>> if we don't take it into account. However, it's Kconfig guarded by:
>>>>>>>> 
>>>>>>>>        depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
>>>>>>>>        default n if PPC_47x
>>>>>>>>        default y
>>>>>>>> 
>>>>>>>> so we never hit it with any core we care about ;).
>>>>>>>> 
>>>>>>>> Memory hot remove does not exist on e500 FWIW, so we don't have to worry
>> about that one either.
>>>>>>> 
>>>>>>> Thanks for this good information :)
>>>>>>> 
>>>>>>> So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside
>> kvm_is_mmio_pfn() to make sure that check is only valid when that is really
>> needed? This can decrease those unnecessary performance loss.
>>>>>>> 
>>>>>>> If I'm wrong please correct me :)
>>>>>> 
>>>>>> You're perfectly right, but this is generic KVM code. So it gets run across
>> all architectures. What if someone has the great idea to add a new case here for
>> x86, but doesn't tell us? In that case we potentially break x86.
>>>>>> 
>>>>>> I'd rather not like to break x86 :).
>>>>>> 
>>>>>> However, it'd be very interesting to see a benchmark with this change. Do
>> you think you could just rip out the whole reserved check and run a few
>> benchmarks and show us the results?
>>>>>> 
>>>>> 
>>>>> Often what case should be adopted to validate this scenario?
>>>> 
>>>> Something which hammers the TLB emulation heavily. I usually just run
>>>> /bin/echo a thousand times in "time" and see how long it takes ;)
>>>> 
>>> 
>>> I tried to run five times with this combination, "time `for ((i=0; i<5000;
>> i++));  do /bin/echo; done`", to calculate the average value with this change:
>>> 
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
>>> 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>> 
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> 
>> I'd feel safer if we narrow this down to e500.
>> 
>>> +       /*
>>> +        * Currently only in memory hot remove case we may still need this.
>>> +        */
>>>       if (pfn_valid(pfn)) {
>> 
>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>> 
>>>               int reserved;
>>>               struct page *tail = pfn_to_page(pfn); @@ -124,6 +128,7
>>> @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>               }
>>>               return PageReserved(tail);
>>>       }
>>> +#endif
>>> 
>>>       return true;
>>> }
>>> 
>>> Before apply this change:
>>> 
>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5=
>> 1m21.376s
>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5=
>> 0m23.433s
>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>> 
>>> After apply this change:
>>> 
>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5=
>> 1m20.667s
>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5=
>> 0m22.615s
>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>> 
>>> So,
>>> 
>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>> 
>> Very nice, so there is a real world performance benefit to doing this. Then yes,
>> I think it would make sense to change the global helper function to be fast on
>> e500 and use that one from e500_shadow_mas2_attrib() instead.
> 
> Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as Scott commented?

rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 24, 2013, 9:35 a.m. UTC | #4
On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> > Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as Scott commented?
> 
> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> 
> 
Because it is much slower and, IIRC, actually used to build pfn map that allow
us to check quickly for valid pfn.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 24, 2013, 9:39 a.m. UTC | #5
On 24.07.2013, at 11:35, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>> Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as Scott commented?
>> 
>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>> 
>> 
> Because it is much slower and, IIRC, actually used to build pfn map that allow
> us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the rest of the kvm code is doing.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 24, 2013, 10:01 a.m. UTC | #6
Copying Andrea for him to verify that I am not talking nonsense :)

On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1580dd4..5e8635b 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> > 
> > bool kvm_is_mmio_pfn(pfn_t pfn)
> > {
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> 
> I'd feel safer if we narrow this down to e500.
> 
> > +       /*
> > +        * Currently only in memory hot remove case we may still need this.
> > +        */
> >        if (pfn_valid(pfn)) {
> 
> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> 
> >                int reserved;
> >                struct page *tail = pfn_to_page(pfn);
> > @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >                }
> >                return PageReserved(tail);
> >        }
> > +#endif
> > 
> >        return true;
> > }
> > 
> > Before apply this change:
> > 
> > real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> > user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> > sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> > 
> > After apply this change:
> > 
> > real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> > user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> > sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> > 
> > So,
> > 
> > real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> > user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> > sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> 
> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> 
> Gleb, Paolo, any hard feelings?
> 
I do not see how can we break the function in such a way and get
away with it. Not all valid pfns point to memory. Physical address can
be sparse (due to PCI hole, framebuffer or just because).

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 24, 2013, 10:09 a.m. UTC | #7
On 24.07.2013, at 12:01, Gleb Natapov wrote:

> Copying Andrea for him to verify that I am not talking nonsense :)
> 
> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 1580dd4..5e8635b 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>> 
>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>> {
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> 
>> I'd feel safer if we narrow this down to e500.
>> 
>>> +       /*
>>> +        * Currently only in memory hot remove case we may still need this.
>>> +        */
>>>       if (pfn_valid(pfn)) {
>> 
>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>> 
>>>               int reserved;
>>>               struct page *tail = pfn_to_page(pfn);
>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>               }
>>>               return PageReserved(tail);
>>>       }
>>> +#endif
>>> 
>>>       return true;
>>> }
>>> 
>>> Before apply this change:
>>> 
>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>> 
>>> After apply this change:
>>> 
>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>> 
>>> So,
>>> 
>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>> 
>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>> 
>> Gleb, Paolo, any hard feelings?
>> 
> I do not see how can we break the function in such a way and get
> away with it. Not all valid pfns point to memory. Physical address can
> be sparse (due to PCI hole, framebuffer or just because).

But we don't check for sparseness today in here either. We merely check for incomplete huge pages.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 24, 2013, 10:19 a.m. UTC | #8
On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> 
> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> 
> > Copying Andrea for him to verify that I am not talking nonsense :)
> > 
> > On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index 1580dd4..5e8635b 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >>> 
> >>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> {
> >>> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> 
> >> I'd feel safer if we narrow this down to e500.
> >> 
> >>> +       /*
> >>> +        * Currently only in memory hot remove case we may still need this.
> >>> +        */
> >>>       if (pfn_valid(pfn)) {
> >> 
> >> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> >> 
> >>>               int reserved;
> >>>               struct page *tail = pfn_to_page(pfn);
> >>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>               }
> >>>               return PageReserved(tail);
> >>>       }
> >>> +#endif
> >>> 
> >>>       return true;
> >>> }
> >>> 
> >>> Before apply this change:
> >>> 
> >>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> >>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> >>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >>> 
> >>> After apply this change:
> >>> 
> >>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> >>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> >>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >>> 
> >>> So,
> >>> 
> >>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> >>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> >>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >> 
> >> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> >> 
> >> Gleb, Paolo, any hard feelings?
> >> 
> > I do not see how can we break the function in such a way and get
> > away with it. Not all valid pfns point to memory. Physical address can
> > be sparse (due to PCI hole, framebuffer or just because).
> 
> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
> 
That's not how I read the code. The code checks for reserved flag set.
It should be set on pfns that point to memory holes. As far as I
understand huge page tricks they are there to guaranty that THP does not
change flags under us, Andrea?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 24, 2013, 10:25 a.m. UTC | #9
On 24.07.2013, at 12:19, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
>> 
>> On 24.07.2013, at 12:01, Gleb Natapov wrote:
>> 
>>> Copying Andrea for him to verify that I am not talking nonsense :)
>>> 
>>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
>>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>>> index 1580dd4..5e8635b 100644
>>>>> --- a/virt/kvm/kvm_main.c
>>>>> +++ b/virt/kvm/kvm_main.c
>>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
>>>>> 
>>>>> bool kvm_is_mmio_pfn(pfn_t pfn)
>>>>> {
>>>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>>> 
>>>> I'd feel safer if we narrow this down to e500.
>>>> 
>>>>> +       /*
>>>>> +        * Currently only in memory hot remove case we may still need this.
>>>>> +        */
>>>>>      if (pfn_valid(pfn)) {
>>>> 
>>>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
>>>> 
>>>>>              int reserved;
>>>>>              struct page *tail = pfn_to_page(pfn);
>>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
>>>>>              }
>>>>>              return PageReserved(tail);
>>>>>      }
>>>>> +#endif
>>>>> 
>>>>>      return true;
>>>>> }
>>>>> 
>>>>> Before apply this change:
>>>>> 
>>>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
>>>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
>>>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
>>>>> 
>>>>> After apply this change:
>>>>> 
>>>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
>>>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
>>>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
>>>>> 
>>>>> So,
>>>>> 
>>>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
>>>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
>>>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
>>>> 
>>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
>>>> 
>>>> Gleb, Paolo, any hard feelings?
>>>> 
>>> I do not see how can we break the function in such a way and get
>>> away with it. Not all valid pfns point to memory. Physical address can
>>> be sparse (due to PCI hole, framebuffer or just because).
>> 
>> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
>> 
> That's not how I read the code. The code checks for reserved flag set.
> It should be set on pfns that point to memory holes. As far as I

I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500.

But I'd be more than happy to get proven wrong :).


Alex

> understand huge page tricks they are there to guaranty that THP does not
> change flags under us, Andrea?
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 24, 2013, 10:30 a.m. UTC | #10
On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
> 
> On 24.07.2013, at 12:19, Gleb Natapov wrote:
> 
> > On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
> >> 
> >> On 24.07.2013, at 12:01, Gleb Natapov wrote:
> >> 
> >>> Copying Andrea for him to verify that I am not talking nonsense :)
> >>> 
> >>> On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
> >>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>>>> index 1580dd4..5e8635b 100644
> >>>>> --- a/virt/kvm/kvm_main.c
> >>>>> +++ b/virt/kvm/kvm_main.c
> >>>>> @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
> >>>>> 
> >>>>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>>> {
> >>>>> +#ifdef CONFIG_MEMORY_HOTPLUG
> >>>> 
> >>>> I'd feel safer if we narrow this down to e500.
> >>>> 
> >>>>> +       /*
> >>>>> +        * Currently only in memory hot remove case we may still need this.
> >>>>> +        */
> >>>>>      if (pfn_valid(pfn)) {
> >>>> 
> >>>> We still have to check for pfn_valid, no? So the #ifdef should be down here.
> >>>> 
> >>>>>              int reserved;
> >>>>>              struct page *tail = pfn_to_page(pfn);
> >>>>> @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>>>              }
> >>>>>              return PageReserved(tail);
> >>>>>      }
> >>>>> +#endif
> >>>>> 
> >>>>>      return true;
> >>>>> }
> >>>>> 
> >>>>> Before apply this change:
> >>>>> 
> >>>>> real    (1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 1m21.376s
> >>>>> user    (0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 0m23.433s
> >>>>> sys	(0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
> >>>>> 
> >>>>> After apply this change:
> >>>>> 
> >>>>> real    (1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 1m20.667s
> >>>>> user    (0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 0m22.615s
> >>>>> sys	(0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
> >>>>> 
> >>>>> So,
> >>>>> 
> >>>>> real    (1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
> >>>>> user    (0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
> >>>>> sys	(0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
> >>>> 
> >>>> Very nice, so there is a real world performance benefit to doing this. Then yes, I think it would make sense to change the global helper function to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
> >>>> 
> >>>> Gleb, Paolo, any hard feelings?
> >>>> 
> >>> I do not see how can we break the function in such a way and get
> >>> away with it. Not all valid pfns point to memory. Physical address can
> >>> be sparse (due to PCI hole, framebuffer or just because).
> >> 
> >> But we don't check for sparseness today in here either. We merely check for incomplete huge pages.
> >> 
> > That's not how I read the code. The code checks for reserved flag set.
> > It should be set on pfns that point to memory holes. As far as I
> 
> I couldn't find any traces of code that sets the reserved bits on e500 chips though. I've only seen it getting set for memory hotplug and memory incoherent DMA code which doesn't get used on e500.
> 
> But I'd be more than happy to get proven wrong :).
> 
Can you write a module that scans all page structures? AFAIK all pages
are marked as reserved and then those that become regular memory are
marked as unreserved. Hope Andrea will chime in here :)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 25, 2013, 8:50 a.m. UTC | #11
On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >
> >On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >
> >> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>> Are not we going to use page_is_ram() from
> >e500_shadow_mas2_attrib() as Scott commented?
> >>>
> >>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>
> >>>
> >> Because it is much slower and, IIRC, actually used to build pfn
> >map that allow
> >> us to check quickly for valid pfn.
> >
> >Then why should we use page_is_ram()? :)
> >
> >I really don't want the e500 code to diverge too much from what
> >the rest of the kvm code is doing.
> 
> I don't understand "actually used to build pfn map...".  What code
> is this?  I don't see any calls to page_is_ram() in the KVM code, or
> in generic mm code.  Is this a statement about what x86 does?
It may be not page_is_ram() directly, but the same into page_is_ram() is
using. On power both page_is_ram() and do_init_bootmem() walks some kind
of memblock_region data structure. What important is that pfn_valid()
does not mean that there is a memory behind page structure. See Andrea's
reply.

> 
> On PPC page_is_ram() is only called (AFAICT) for determining what
> attributes to set on mmaps.  We want to be sure that KVM always
> makes the same decision.  While pfn_valid() seems like it should be
> equivalent, it's not obvious from the PPC code that it is.
> 
Again pfn_valid() is not enough.

> If pfn_valid() is better, why is that not used for mmap?  Why are
> there two different names for the same thing?
> 
They are not the same thing. page_is_ram() tells you if phys address is
ram backed. pfn_valid() tells you if there is struct page behind the
pfn. PageReserved() tells if you a pfn is marked as reserved. All non
ram pfns should be reserved, but ram pfns can be reserved too. Again,
see Andrea's reply.

Why ppc uses page_is_ram() for mmap? How should I know? But looking at
the function it does it only as a fallback if
ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
noncached as a safe fallback makes sense. It is also make sense to allow
noncached access to reserved ram sometimes.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 25, 2013, 4:07 p.m. UTC | #12
On 25.07.2013, at 10:50, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
>> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
>>> 
>>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
>>> 
>>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>>>>> Are not we going to use page_is_ram() from
>>> e500_shadow_mas2_attrib() as Scott commented?
>>>>> 
>>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>>>> 
>>>>> 
>>>> Because it is much slower and, IIRC, actually used to build pfn
>>> map that allow
>>>> us to check quickly for valid pfn.
>>> 
>>> Then why should we use page_is_ram()? :)
>>> 
>>> I really don't want the e500 code to diverge too much from what
>>> the rest of the kvm code is doing.
>> 
>> I don't understand "actually used to build pfn map...".  What code
>> is this?  I don't see any calls to page_is_ram() in the KVM code, or
>> in generic mm code.  Is this a statement about what x86 does?
> It may be not page_is_ram() directly, but the same into page_is_ram() is
> using. On power both page_is_ram() and do_init_bootmem() walks some kind
> of memblock_region data structure. What important is that pfn_valid()
> does not mean that there is a memory behind page structure. See Andrea's
> reply.
> 
>> 
>> On PPC page_is_ram() is only called (AFAICT) for determining what
>> attributes to set on mmaps.  We want to be sure that KVM always
>> makes the same decision.  While pfn_valid() seems like it should be
>> equivalent, it's not obvious from the PPC code that it is.
>> 
> Again pfn_valid() is not enough.
> 
>> If pfn_valid() is better, why is that not used for mmap?  Why are
>> there two different names for the same thing?
>> 
> They are not the same thing. page_is_ram() tells you if phys address is
> ram backed. pfn_valid() tells you if there is struct page behind the
> pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> ram pfns should be reserved, but ram pfns can be reserved too. Again,
> see Andrea's reply.
> 
> Why ppc uses page_is_ram() for mmap? How should I know? But looking at

That one's easy. Let's just ask Ben. Ben, is there any particular reason PPC uses page_is_ram() rather than what KVM does here to figure out whether a pfn is RAM or not? It would be really useful to be able to run the exact same logic that figures out whether we're cacheable or not in both TLB writers (KVM and linux-mm).


Alex

> the function it does it only as a fallback if
> ppc_md.phys_mem_access_prot() is not provided. Making access to MMIO
> noncached as a safe fallback makes sense. It is also make sense to allow
> noncached access to reserved ram sometimes.
> 
> --
> 			Gleb.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 25, 2013, 4:14 p.m. UTC | #13
On Thu, Jul 25, 2013 at 06:07:55PM +0200, Alexander Graf wrote:
> 
> On 25.07.2013, at 10:50, Gleb Natapov wrote:
> 
> > On Wed, Jul 24, 2013 at 03:32:49PM -0500, Scott Wood wrote:
> >> On 07/24/2013 04:39:59 AM, Alexander Graf wrote:
> >>> 
> >>> On 24.07.2013, at 11:35, Gleb Natapov wrote:
> >>> 
> >>>> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
> >>>>>> Are not we going to use page_is_ram() from
> >>> e500_shadow_mas2_attrib() as Scott commented?
> >>>>> 
> >>>>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
> >>>>> 
> >>>>> 
> >>>> Because it is much slower and, IIRC, actually used to build pfn
> >>> map that allow
> >>>> us to check quickly for valid pfn.
> >>> 
> >>> Then why should we use page_is_ram()? :)
> >>> 
> >>> I really don't want the e500 code to diverge too much from what
> >>> the rest of the kvm code is doing.
> >> 
> >> I don't understand "actually used to build pfn map...".  What code
> >> is this?  I don't see any calls to page_is_ram() in the KVM code, or
> >> in generic mm code.  Is this a statement about what x86 does?
> > It may be not page_is_ram() directly, but the same into page_is_ram() is
> > using. On power both page_is_ram() and do_init_bootmem() walks some kind
> > of memblock_region data structure. What important is that pfn_valid()
> > does not mean that there is a memory behind page structure. See Andrea's
> > reply.
> > 
> >> 
> >> On PPC page_is_ram() is only called (AFAICT) for determining what
> >> attributes to set on mmaps.  We want to be sure that KVM always
> >> makes the same decision.  While pfn_valid() seems like it should be
> >> equivalent, it's not obvious from the PPC code that it is.
> >> 
> > Again pfn_valid() is not enough.
> > 
> >> If pfn_valid() is better, why is that not used for mmap?  Why are
> >> there two different names for the same thing?
> >> 
> > They are not the same thing. page_is_ram() tells you if phys address is
> > ram backed. pfn_valid() tells you if there is struct page behind the
> > pfn. PageReserved() tells if you a pfn is marked as reserved. All non
> > ram pfns should be reserved, but ram pfns can be reserved too. Again,
> > see Andrea's reply.
> > 
> > Why ppc uses page_is_ram() for mmap? How should I know? But looking at
> 
> That one's easy. Let's just ask Ben. Ben, is there any particular reason PPC uses page_is_ram() rather than what KVM does here to figure out whether a pfn is RAM or not? It would be really useful to be able to run the exact same logic that figures out whether we're cacheable or not in both TLB writers (KVM and linux-mm).
> 
KVM does not only try to figure out what is RAM or not! Look at how KVM
uses the function. KVM tries to figure out if refcounting needed to be
used on this page among other things.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..5e8635b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,6 +102,10 @@  static bool largepages_enabled = true;

  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
+#ifdef CONFIG_MEMORY_HOTPLUG
+       /*
+        * Currently only in memory hot remove case we may still need this.
+        */
         if (pfn_valid(pfn)) {
                 int reserved;
                 struct page *tail = pfn_to_page(pfn);
@@ -124,6 +128,7 @@  bool kvm_is_mmio_pfn(pfn_t pfn)
                 }
                 return PageReserved(tail);
         }
+#endif

         return true;
  }