diff mbox

[v1] x86/mm: Supresses vm_events caused by page-walks

Message ID 1509359568-3349-1-git-send-email-aisaila@bitdefender.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexandru Stefan ISAILA Oct. 30, 2017, 10:32 a.m. UTC
This patch is adding a way to enable/disable nested pagefault
events. It introduces the xc_monitor_nested_pagefault function
and adds the nested_pagefault_disabled in the monitor structure.
This is needed by the introspection so it will only get gla
faults and not get spammed with other faults.
In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
v->arch.sse_pg_dirty.gla are used to mark that this is the
second time a fault occurs and the dirty bit is set.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
---
 tools/libxc/include/xenctrl.h |  2 ++
 tools/libxc/xc_monitor.c      | 14 ++++++++++++++
 xen/arch/x86/mm/mem_access.c  | 27 +++++++++++++++++++++++++++
 xen/arch/x86/monitor.c        | 13 +++++++++++++
 xen/include/asm-x86/domain.h  |  6 ++++++
 xen/include/asm-x86/monitor.h |  3 ++-
 xen/include/public/domctl.h   |  1 +
 7 files changed, 65 insertions(+), 1 deletion(-)

Comments

Tamas K Lengyel Oct. 30, 2017, 4:01 p.m. UTC | #1
On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
<aisaila@bitdefender.com> wrote:
> This patch is adding a way to enable/disable nested pagefault
> events. It introduces the xc_monitor_nested_pagefault function
> and adds the nested_pagefault_disabled in the monitor structure.
> This is needed by the introspection so it will only get gla
> faults and not get spammed with other faults.
> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
> v->arch.sse_pg_dirty.gla are used to mark that this is the
> second time a fault occurs and the dirty bit is set.

Could you describe under what conditions do you get these other faults?

Thanks,
Tamas
Razvan Cojocaru Oct. 30, 2017, 4:24 p.m. UTC | #2
On 30.10.2017 18:01, Tamas K Lengyel wrote:
> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
> <aisaila@bitdefender.com> wrote:
>> This patch is adding a way to enable/disable nested pagefault
>> events. It introduces the xc_monitor_nested_pagefault function
>> and adds the nested_pagefault_disabled in the monitor structure.
>> This is needed by the introspection so it will only get gla
>> faults and not get spammed with other faults.
>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>> second time a fault occurs and the dirty bit is set.
> 
> Could you describe under what conditions do you get these other faults?

Hey Tamas, the whole story is at page 8 of this document:

https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection


Thanks,
Razvan
Tamas K Lengyel Oct. 30, 2017, 4:39 p.m. UTC | #3
On Mon, Oct 30, 2017 at 10:24 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 30.10.2017 18:01, Tamas K Lengyel wrote:
>> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
>> <aisaila@bitdefender.com> wrote:
>>> This patch is adding a way to enable/disable nested pagefault
>>> events. It introduces the xc_monitor_nested_pagefault function
>>> and adds the nested_pagefault_disabled in the monitor structure.
>>> This is needed by the introspection so it will only get gla
>>> faults and not get spammed with other faults.
>>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>>> second time a fault occurs and the dirty bit is set.
>>
>> Could you describe under what conditions do you get these other faults?
>
> Hey Tamas, the whole story is at page 8 of this document:
>
> https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection

Hi Razvan,
thanks but I'm not sure that doc addresses my question. You
effectively filter out npfec_kind_in_gpt and npfec_kind_unknown in
this patch. The first, npfec_kind_in_gpt should only happen if you
have restricted access to the gpt with ept and the processor couldn't
walk the table. But if you don't want to get events of these types
then why not simply not restrict access the gpt to begin with? And as
for npfec_kind_unknown, I don't think that gets generated under any
situation. So hence my question, what is your setup that makes this
patch necessary?

Thanks,
Tamas
Razvan Cojocaru Oct. 30, 2017, 5:01 p.m. UTC | #4
On 10/30/2017 06:39 PM, Tamas K Lengyel wrote:
> On Mon, Oct 30, 2017 at 10:24 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> On 30.10.2017 18:01, Tamas K Lengyel wrote:
>>> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
>>> <aisaila@bitdefender.com> wrote:
>>>> This patch is adding a way to enable/disable nested pagefault
>>>> events. It introduces the xc_monitor_nested_pagefault function
>>>> and adds the nested_pagefault_disabled in the monitor structure.
>>>> This is needed by the introspection so it will only get gla
>>>> faults and not get spammed with other faults.
>>>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>>>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>>>> second time a fault occurs and the dirty bit is set.
>>>
>>> Could you describe under what conditions do you get these other faults?
>>
>> Hey Tamas, the whole story is at page 8 of this document:
>>
>> https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection
> 
> Hi Razvan,
> thanks but I'm not sure that doc addresses my question. You
> effectively filter out npfec_kind_in_gpt and npfec_kind_unknown in
> this patch. The first, npfec_kind_in_gpt should only happen if you
> have restricted access to the gpt with ept and the processor couldn't
> walk the table. But if you don't want to get events of these types
> then why not simply not restrict access the gpt to begin with? And as
> for npfec_kind_unknown, I don't think that gets generated under any
> situation. So hence my question, what is your setup that makes this
> patch necessary?

On the npfec_kind_unknown case, indeed, we were wondering when that
might possibly occur when discussing this patch - it's probably reserved
for the future?

On why our introspection engine decides to restrict access to those
specific pages, I am not intimate with its inner workings, and not sure
how much could be disclosed here in any case. Is it not a worthwhile
(and otherwise harmless) tool to be able to switch A/D bits-triggered
EPT faults anyway, for introspection purposes?


Thanks,
Razvan
Tamas K Lengyel Oct. 30, 2017, 5:07 p.m. UTC | #5
On Mon, Oct 30, 2017 at 11:01 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 10/30/2017 06:39 PM, Tamas K Lengyel wrote:
>> On Mon, Oct 30, 2017 at 10:24 AM, Razvan Cojocaru
>> <rcojocaru@bitdefender.com> wrote:
>>> On 30.10.2017 18:01, Tamas K Lengyel wrote:
>>>> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
>>>> <aisaila@bitdefender.com> wrote:
>>>>> This patch is adding a way to enable/disable nested pagefault
>>>>> events. It introduces the xc_monitor_nested_pagefault function
>>>>> and adds the nested_pagefault_disabled in the monitor structure.
>>>>> This is needed by the introspection so it will only get gla
>>>>> faults and not get spammed with other faults.
>>>>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>>>>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>>>>> second time a fault occurs and the dirty bit is set.
>>>>
>>>> Could you describe under what conditions do you get these other faults?
>>>
>>> Hey Tamas, the whole story is at page 8 of this document:
>>>
>>> https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection
>>
>> Hi Razvan,
>> thanks but I'm not sure that doc addresses my question. You
>> effectively filter out npfec_kind_in_gpt and npfec_kind_unknown in
>> this patch. The first, npfec_kind_in_gpt should only happen if you
>> have restricted access to the gpt with ept and the processor couldn't
>> walk the table. But if you don't want to get events of these types
>> then why not simply not restrict access the gpt to begin with? And as
>> for npfec_kind_unknown, I don't think that gets generated under any
>> situation. So hence my question, what is your setup that makes this
>> patch necessary?
>
> On the npfec_kind_unknown case, indeed, we were wondering when that
> might possibly occur when discussing this patch - it's probably reserved
> for the future?
>
> On why our introspection engine decides to restrict access to those
> specific pages, I am not intimate with its inner workings, and not sure
> how much could be disclosed here in any case. Is it not a worthwhile
> (and otherwise harmless) tool to be able to switch A/D bits-triggered
> EPT faults anyway, for introspection purposes?

It changes the default behavior of mem_access events so I just wanted
to get some background on when that is really required. Technically
there is no reason why we couldn't do that filtering in Xen. I think
it might be better to flip the filter the other way though so the
default behavior remains as is (ie. change the option to enable
filtering instead of enabling monitoring).

Tamas
Razvan Cojocaru Oct. 30, 2017, 5:19 p.m. UTC | #6
On 10/30/2017 07:07 PM, Tamas K Lengyel wrote:
> On Mon, Oct 30, 2017 at 11:01 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com> wrote:
>> On 10/30/2017 06:39 PM, Tamas K Lengyel wrote:
>>> On Mon, Oct 30, 2017 at 10:24 AM, Razvan Cojocaru
>>> <rcojocaru@bitdefender.com> wrote:
>>>> On 30.10.2017 18:01, Tamas K Lengyel wrote:
>>>>> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
>>>>> <aisaila@bitdefender.com> wrote:
>>>>>> This patch is adding a way to enable/disable nested pagefault
>>>>>> events. It introduces the xc_monitor_nested_pagefault function
>>>>>> and adds the nested_pagefault_disabled in the monitor structure.
>>>>>> This is needed by the introspection so it will only get gla
>>>>>> faults and not get spammed with other faults.
>>>>>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>>>>>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>>>>>> second time a fault occurs and the dirty bit is set.
>>>>>
>>>>> Could you describe under what conditions do you get these other faults?
>>>>
>>>> Hey Tamas, the whole story is at page 8 of this document:
>>>>
>>>> https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection
>>>
>>> Hi Razvan,
>>> thanks but I'm not sure that doc addresses my question. You
>>> effectively filter out npfec_kind_in_gpt and npfec_kind_unknown in
>>> this patch. The first, npfec_kind_in_gpt should only happen if you
>>> have restricted access to the gpt with ept and the processor couldn't
>>> walk the table. But if you don't want to get events of these types
>>> then why not simply not restrict access the gpt to begin with? And as
>>> for npfec_kind_unknown, I don't think that gets generated under any
>>> situation. So hence my question, what is your setup that makes this
>>> patch necessary?
>>
>> On the npfec_kind_unknown case, indeed, we were wondering when that
>> might possibly occur when discussing this patch - it's probably reserved
>> for the future?
>>
>> On why our introspection engine decides to restrict access to those
>> specific pages, I am not intimate with its inner workings, and not sure
>> how much could be disclosed here in any case. Is it not a worthwhile
>> (and otherwise harmless) tool to be able to switch A/D bits-triggered
>> EPT faults anyway, for introspection purposes?
> 
> It changes the default behavior of mem_access events so I just wanted
> to get some background on when that is really required. Technically
> there is no reason why we couldn't do that filtering in Xen. I think
> it might be better to flip the filter the other way though so the
> default behavior remains as is (ie. change the option to enable
> filtering instead of enabling monitoring).

Wait, it shouldn't change the default behaviour at all. If nobody calls
that function, all the EPT event kinds should be sent out - the new
monitor flag is a "disable" flag for non-GLA event (the so-called
"nested page fault" events).


Thanks,
Razvan
Tamas K Lengyel Oct. 30, 2017, 5:38 p.m. UTC | #7
On Mon, Oct 30, 2017 at 11:19 AM, Razvan Cojocaru
<rcojocaru@bitdefender.com> wrote:
> On 10/30/2017 07:07 PM, Tamas K Lengyel wrote:
>> On Mon, Oct 30, 2017 at 11:01 AM, Razvan Cojocaru
>> <rcojocaru@bitdefender.com> wrote:
>>> On 10/30/2017 06:39 PM, Tamas K Lengyel wrote:
>>>> On Mon, Oct 30, 2017 at 10:24 AM, Razvan Cojocaru
>>>> <rcojocaru@bitdefender.com> wrote:
>>>>> On 30.10.2017 18:01, Tamas K Lengyel wrote:
>>>>>> On Mon, Oct 30, 2017 at 4:32 AM, Alexandru Isaila
>>>>>> <aisaila@bitdefender.com> wrote:
>>>>>>> This patch is adding a way to enable/disable nested pagefault
>>>>>>> events. It introduces the xc_monitor_nested_pagefault function
>>>>>>> and adds the nested_pagefault_disabled in the monitor structure.
>>>>>>> This is needed by the introspection so it will only get gla
>>>>>>> faults and not get spammed with other faults.
>>>>>>> In p2m_set_ad_bits the v->arch.sse_pg_dirty.eip and
>>>>>>> v->arch.sse_pg_dirty.gla are used to mark that this is the
>>>>>>> second time a fault occurs and the dirty bit is set.
>>>>>>
>>>>>> Could you describe under what conditions do you get these other faults?
>>>>>
>>>>> Hey Tamas, the whole story is at page 8 of this document:
>>>>>
>>>>> https://www.researchgate.net/publication/281835515_Proposed_Processor_Extensions_for_Significant_Speedup_of_Hypervisor_Memory_Introspection
>>>>
>>>> Hi Razvan,
>>>> thanks but I'm not sure that doc addresses my question. You
>>>> effectively filter out npfec_kind_in_gpt and npfec_kind_unknown in
>>>> this patch. The first, npfec_kind_in_gpt should only happen if you
>>>> have restricted access to the gpt with ept and the processor couldn't
>>>> walk the table. But if you don't want to get events of these types
>>>> then why not simply not restrict access the gpt to begin with? And as
>>>> for npfec_kind_unknown, I don't think that gets generated under any
>>>> situation. So hence my question, what is your setup that makes this
>>>> patch necessary?
>>>
>>> On the npfec_kind_unknown case, indeed, we were wondering when that
>>> might possibly occur when discussing this patch - it's probably reserved
>>> for the future?
>>>
>>> On why our introspection engine decides to restrict access to those
>>> specific pages, I am not intimate with its inner workings, and not sure
>>> how much could be disclosed here in any case. Is it not a worthwhile
>>> (and otherwise harmless) tool to be able to switch A/D bits-triggered
>>> EPT faults anyway, for introspection purposes?
>>
>> It changes the default behavior of mem_access events so I just wanted
>> to get some background on when that is really required. Technically
>> there is no reason why we couldn't do that filtering in Xen. I think
>> it might be better to flip the filter the other way though so the
>> default behavior remains as is (ie. change the option to enable
>> filtering instead of enabling monitoring).
>
> Wait, it shouldn't change the default behaviour at all. If nobody calls
> that function, all the EPT event kinds should be sent out - the new
> monitor flag is a "disable" flag for non-GLA event (the so-called
> "nested page fault" events).

Oh yea you are right, I completely overlooked that it is named
"nested_pagefault_disabled" =) Maybe a comment in the domctl header
would be warranted to note that this is enabled by default when
mem_access is used.

Tamas
diff mbox

Patch

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 666db0b..8e70714 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2056,6 +2056,8 @@  int xc_monitor_descriptor_access(xc_interface *xch, uint32_t domain_id,
                                  bool enable);
 int xc_monitor_guest_request(xc_interface *xch, uint32_t domain_id,
                              bool enable, bool sync, bool allow_userspace);
+int xc_monitor_nested_pagefault(xc_interface *xch, uint32_t domain_id,
+                                bool disable);
 int xc_monitor_debug_exceptions(xc_interface *xch, uint32_t domain_id,
                                 bool enable, bool sync);
 int xc_monitor_cpuid(xc_interface *xch, uint32_t domain_id, bool enable);
diff --git a/tools/libxc/xc_monitor.c b/tools/libxc/xc_monitor.c
index 2840f14..5aacaa8 100644
--- a/tools/libxc/xc_monitor.c
+++ b/tools/libxc/xc_monitor.c
@@ -162,6 +162,20 @@  int xc_monitor_guest_request(xc_interface *xch, uint32_t domain_id, bool enable,
     return do_domctl(xch, &domctl);
 }
 
+int xc_monitor_nested_pagefault(xc_interface *xch, uint32_t domain_id,
+                                bool disable)
+{
+    DECLARE_DOMCTL;
+
+    domctl.cmd = XEN_DOMCTL_monitor_op;
+    domctl.domain = domain_id;
+    domctl.u.monitor_op.op = disable ? XEN_DOMCTL_MONITOR_OP_ENABLE
+                                    : XEN_DOMCTL_MONITOR_OP_DISABLE;
+    domctl.u.monitor_op.event = XEN_DOMCTL_MONITOR_EVENT_NESTED_PAGEFAULT;
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_monitor_emulate_each_rep(xc_interface *xch, uint32_t domain_id,
                                 bool enable)
 {
diff --git a/xen/arch/x86/mm/mem_access.c b/xen/arch/x86/mm/mem_access.c
index c0cd017..07a334b 100644
--- a/xen/arch/x86/mm/mem_access.c
+++ b/xen/arch/x86/mm/mem_access.c
@@ -137,6 +137,23 @@  bool p2m_mem_access_emulate_check(struct vcpu *v,
     return violation;
 }
 
+static void p2m_set_ad_bits(struct vcpu *v, paddr_t ga)
+{
+    struct hvm_hw_cpu ctxt;
+    uint32_t pfec = 0;
+
+    hvm_funcs.save_cpu_ctxt(v, &ctxt);
+
+    if ( guest_cpu_user_regs()->eip == v->arch.pg_dirty.eip
+         && ga == v->arch.pg_dirty.gla )
+        pfec = PFEC_write_access;
+
+    paging_ga_to_gfn_cr3(v, ctxt.cr3, ga, &pfec, NULL);
+
+    v->arch.pg_dirty.eip = guest_cpu_user_regs()->eip;
+    v->arch.pg_dirty.gla = ga;
+}
+
 bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
                           struct npfec npfec,
                           vm_event_request_t **req_ptr)
@@ -208,6 +225,16 @@  bool p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         }
     }
 
+    if ( vm_event_check_ring(d->vm_event_monitor) &&
+         d->arch.monitor.nested_pagefault_disabled &&
+         npfec.kind != npfec_kind_with_gla ) /* don't send a mem_event */
+    {
+        v->arch.vm_event->emulate_flags = 0;
+        p2m_set_ad_bits(v, gla);
+
+        return true;
+    }
+
     *req_ptr = NULL;
     req = xzalloc(vm_event_request_t);
     if ( req )
diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
index e59f1f5..3916e76 100644
--- a/xen/arch/x86/monitor.c
+++ b/xen/arch/x86/monitor.c
@@ -220,6 +220,19 @@  int arch_monitor_domctl_event(struct domain *d,
         break;
     }
 
+    case XEN_DOMCTL_MONITOR_EVENT_NESTED_PAGEFAULT:
+    {
+        bool old_status = ad->monitor.nested_pagefault_disabled;
+
+        if ( unlikely(old_status == requested_status) )
+            return -EEXIST;
+
+        domain_pause(d);
+        ad->monitor.nested_pagefault_disabled = requested_status;
+        domain_unpause(d);
+        break;
+    }
+
     case XEN_DOMCTL_MONITOR_EVENT_DESC_ACCESS:
     {
         bool old_status = ad->monitor.descriptor_access_enabled;
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4d0b77d..40a365f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -408,6 +408,7 @@  struct arch_domain
         unsigned int descriptor_access_enabled                             : 1;
         unsigned int guest_request_userspace_enabled                       : 1;
         unsigned int emul_unimplemented_enabled                            : 1;
+        unsigned int nested_pagefault_disabled                             : 1;
         struct monitor_msr_bitmap *msr_bitmap;
         uint64_t write_ctrlreg_mask[4];
     } monitor;
@@ -575,6 +576,11 @@  struct arch_vcpu
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
 
+    struct {
+        unsigned long eip;
+        unsigned long gla;
+    } pg_dirty;
+
     struct arch_vm_event *vm_event;
 
     struct msr_vcpu_policy *msr;
diff --git a/xen/include/asm-x86/monitor.h b/xen/include/asm-x86/monitor.h
index 0ada970..6b6a146 100644
--- a/xen/include/asm-x86/monitor.h
+++ b/xen/include/asm-x86/monitor.h
@@ -84,7 +84,8 @@  static inline uint32_t arch_monitor_get_capabilities(struct domain *d)
                    (1U << XEN_DOMCTL_MONITOR_EVENT_DEBUG_EXCEPTION) |
                    (1U << XEN_DOMCTL_MONITOR_EVENT_CPUID) |
                    (1U << XEN_DOMCTL_MONITOR_EVENT_INTERRUPT) |
-                   (1U << XEN_DOMCTL_MONITOR_EVENT_EMUL_UNIMPLEMENTED);
+                   (1U << XEN_DOMCTL_MONITOR_EVENT_EMUL_UNIMPLEMENTED) |
+                   (1U << XEN_DOMCTL_MONITOR_EVENT_NESTED_PAGEFAULT);
 
     /* Since we know this is on VMX, we can just call the hvm func */
     if ( hvm_is_singlestep_supported() )
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 70027ab..b5cf06c 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1014,6 +1014,7 @@  struct xen_domctl_psr_cmt_op {
 #define XEN_DOMCTL_MONITOR_EVENT_INTERRUPT             8
 #define XEN_DOMCTL_MONITOR_EVENT_DESC_ACCESS           9
 #define XEN_DOMCTL_MONITOR_EVENT_EMUL_UNIMPLEMENTED    10
+#define XEN_DOMCTL_MONITOR_EVENT_NESTED_PAGEFAULT      11
 
 struct xen_domctl_monitor_op {
     uint32_t op; /* XEN_DOMCTL_MONITOR_OP_* */