diff mbox

[v7,5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.

Message ID 1488987232-12349-6-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu Zhang March 8, 2017, 3:33 p.m. UTC
After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
synchronously by iterating the p2m table.

The synchronous resetting is necessary because we need to guarantee
the p2m table is clean before another ioreq server is mapped. And
since the sweeping of p2m table could be time consuming, it is done
with hypercall continuation.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

changes in v1: 
  - This patch is splitted from patch 4 of last version.
  - According to comments from Jan: update the gfn_start for
    when use hypercall continuation to reset the p2m type.
  - According to comments from Jan: use min() to compare gfn_end
    and max mapped pfn in p2m_finish_type_change()
---
 xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
 xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h      |  5 +++++
 xen/include/public/hvm/dm_op.h |  3 +--
 4 files changed, 73 insertions(+), 7 deletions(-)

Comments

Jan Beulich March 10, 2017, 4:17 p.m. UTC | #1
>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>      return 0;
>  }
>  
> +#define DMOP_op_mask 0xff
>  static int dm_op(domid_t domid,

Please follow the existing continuation model used here, instead of
re-introducing the hvmop one.

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>  int p2m_change_type_one(struct domain *d, unsigned long gfn,
>                          p2m_type_t ot, p2m_type_t nt);
>  
> +/* Synchronously change types across a range of p2m entries (start ... end) */
> +void p2m_finish_type_change(struct domain *d,
> +                            unsigned long start, unsigned long end,
> +                            p2m_type_t ot, p2m_type_t nt);

The comment should not give the impression that this is an open
range. I.e. [start, end], but perhaps even better would be to not
use "end" here, as that frequently (e.g. p2m_change_type_range())
this is used to indicate an exclusive range end.

Jan
Andrew Cooper March 10, 2017, 4:59 p.m. UTC | #2
On 08/03/17 15:33, Yu Zhang wrote:
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> synchronously by iterating the p2m table.
>
> The synchronous resetting is necessary because we need to guarantee
> the p2m table is clean before another ioreq server is mapped. And
> since the sweeping of p2m table could be time consuming, it is done
> with hypercall continuation.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>
> changes in v1: 
>   - This patch is splitted from patch 4 of last version.
>   - According to comments from Jan: update the gfn_start for
>     when use hypercall continuation to reset the p2m type.
>   - According to comments from Jan: use min() to compare gfn_end
>     and max mapped pfn in p2m_finish_type_change()
> ---
>  xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
>  xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h      |  5 +++++
>  xen/include/public/hvm/dm_op.h |  3 +--
>  4 files changed, 73 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index f97478b..a92d5d7 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>      return 0;
>  }
>  
> +#define DMOP_op_mask 0xff
>  static int dm_op(domid_t domid,
>                   unsigned int nr_bufs,
>                   xen_dm_op_buf_t bufs[])
> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>      }
>  
>      rc = -EINVAL;
> -    if ( op.pad )
> -        goto out;
>  
> -    switch ( op.op )
> +    switch ( op.op & DMOP_op_mask )

Nack to changes like this.  HVMop continuations only existed in this
form because we had to retrofit it to longterm-stable ABIs in the past,
and there were literally no free bits anywhere else.

Currently, the DMOP ABI isn't set in stone, so you have until code
freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
consider which other DMops might potentially need to become continuable,
and take preventative action before the freeze.


If we need to make similar changes once the ABI truely is frozen, there
are plenty of free bits in the end of the union which can be used
without breaking the ABI.

~Andrew.
Yu Zhang March 11, 2017, 8:42 a.m. UTC | #3
On 3/11/2017 12:17 AM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>       return 0;
>>   }
>>   
>> +#define DMOP_op_mask 0xff
>>   static int dm_op(domid_t domid,
> Please follow the existing continuation model used here, instead of
> re-introducing the hvmop one.

Thank you, Jan. Andrew has the same concern with you. So please see my reply
to his comments - I explained the difficulties I met while doing this. 
It would be
great if you and Andrew could give some advice.:-)

>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>>   int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>                           p2m_type_t ot, p2m_type_t nt);
>>   
>> +/* Synchronously change types across a range of p2m entries (start ... end) */
>> +void p2m_finish_type_change(struct domain *d,
>> +                            unsigned long start, unsigned long end,
>> +                            p2m_type_t ot, p2m_type_t nt);
> The comment should not give the impression that this is an open
> range. I.e. [start, end], but perhaps even better would be to not
> use "end" here, as that frequently (e.g. p2m_change_type_range())
> this is used to indicate an exclusive range end.

So how about [first_gfn, last_gfn]?

Thanks
Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Yu Zhang March 11, 2017, 8:42 a.m. UTC | #4
On 3/11/2017 12:59 AM, Andrew Cooper wrote:
> On 08/03/17 15:33, Yu Zhang wrote:
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> synchronously by iterating the p2m table.
>>
>> The synchronous resetting is necessary because we need to guarantee
>> the p2m table is clean before another ioreq server is mapped. And
>> since the sweeping of p2m table could be time consuming, it is done
>> with hypercall continuation.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> changes in v1:
>>    - This patch is splitted from patch 4 of last version.
>>    - According to comments from Jan: update the gfn_start for
>>      when use hypercall continuation to reset the p2m type.
>>    - According to comments from Jan: use min() to compare gfn_end
>>      and max mapped pfn in p2m_finish_type_change()
>> ---
>>   xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
>>   xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
>>   xen/include/asm-x86/p2m.h      |  5 +++++
>>   xen/include/public/hvm/dm_op.h |  3 +--
>>   4 files changed, 73 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
>> index f97478b..a92d5d7 100644
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>       return 0;
>>   }
>>   
>> +#define DMOP_op_mask 0xff
>>   static int dm_op(domid_t domid,
>>                    unsigned int nr_bufs,
>>                    xen_dm_op_buf_t bufs[])
>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>       }
>>   
>>       rc = -EINVAL;
>> -    if ( op.pad )
>> -        goto out;
>>   
>> -    switch ( op.op )
>> +    switch ( op.op & DMOP_op_mask )
> Nack to changes like this.  HVMop continuations only existed in this
> form because we had to retrofit it to longterm-stable ABIs in the past,
> and there were literally no free bits anywhere else.

Thank you, Andrew. Frankly, I'm not surprised to see disagreement from 
you and Jan
on this interface. :)

Using the HVMop like continuation is a hard choice for me. I saw you 
removed these
from the DMops, and I also agree that the new interface is much cleaner.

I noticed there are other 2 DMops to continuable, the set_mem_type and 
the modified_mem.
Both definitions of their structure have fields like first_gfn and nr 
which can be updated to
be used in the hypercall continuation.
But for map_mem_type_to_ioreq_server, however we do not need a gfn 
number exposed in
this interface(and I do not think exposing a gfn is correct), it is only 
used when an ioreq server
detaches from p2m_ioreq_server and the p2m sweeping is not finished. So 
I changed definition
of the xen_dm_op, removed the pad field and extend the size of op to 64 
bit(so that we can have
free bits). But I do not think this is an optimal solution either.

> Currently, the DMOP ABI isn't set in stone, so you have until code
> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
> consider which other DMops might potentially need to become continuable,
> and take preventative action before the freeze.

I can only see set_mem_type and modified_mem need to become continuable, 
and they does
this quite elegantly now.

>
> If we need to make similar changes once the ABI truely is frozen, there
> are plenty of free bits in the end of the union which can be used
> without breaking the ABI.

Another propose is to change the definition of 
xen_dm_op_map_mem_type_to_ioreq_server,
and extend the flags field from uint32_t to uint64_t and use the upper 
bits to store the gfn.
Do you think this proposal acceptable?
Besides using some free bits to store the gfn, do we have any other 
approaches? Any suggestions
would be welcome! :)

Thanks
Yu

> ~Andrew.
>
Jan Beulich March 13, 2017, 11:24 a.m. UTC | #5
>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/11/2017 12:17 AM, Jan Beulich wrote:
>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>>>   int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>>                           p2m_type_t ot, p2m_type_t nt);
>>>   
>>> +/* Synchronously change types across a range of p2m entries (start ... end) */
>>> +void p2m_finish_type_change(struct domain *d,
>>> +                            unsigned long start, unsigned long end,
>>> +                            p2m_type_t ot, p2m_type_t nt);
>> The comment should not give the impression that this is an open
>> range. I.e. [start, end], but perhaps even better would be to not
>> use "end" here, as that frequently (e.g. p2m_change_type_range())
>> this is used to indicate an exclusive range end.
> 
> So how about [first_gfn, last_gfn]?

That's fine.

Jan
Jan Beulich March 13, 2017, 11:32 a.m. UTC | #6
>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/11/2017 12:59 AM, Andrew Cooper wrote:
>> On 08/03/17 15:33, Yu Zhang wrote:
>>> --- a/xen/arch/x86/hvm/dm.c
>>> +++ b/xen/arch/x86/hvm/dm.c
>>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>>       return 0;
>>>   }
>>>   
>>> +#define DMOP_op_mask 0xff
>>>   static int dm_op(domid_t domid,
>>>                    unsigned int nr_bufs,
>>>                    xen_dm_op_buf_t bufs[])
>>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>>       }
>>>   
>>>       rc = -EINVAL;
>>> -    if ( op.pad )
>>> -        goto out;
>>>   
>>> -    switch ( op.op )
>>> +    switch ( op.op & DMOP_op_mask )
>> Nack to changes like this.  HVMop continuations only existed in this
>> form because we had to retrofit it to longterm-stable ABIs in the past,
>> and there were literally no free bits anywhere else.
> 
> Thank you, Andrew. Frankly, I'm not surprised to see disagreement from 
> you and Jan
> on this interface. :)
> 
> Using the HVMop like continuation is a hard choice for me. I saw you 
> removed these
> from the DMops, and I also agree that the new interface is much cleaner.
> 
> I noticed there are other 2 DMops to continuable, the set_mem_type and 
> the modified_mem.
> Both definitions of their structure have fields like first_gfn and nr 
> which can be updated to
> be used in the hypercall continuation.
> But for map_mem_type_to_ioreq_server, however we do not need a gfn 
> number exposed in
> this interface(and I do not think exposing a gfn is correct), it is only 
> used when an ioreq server
> detaches from p2m_ioreq_server and the p2m sweeping is not finished. So 
> I changed definition
> of the xen_dm_op, removed the pad field and extend the size of op to 64 
> bit(so that we can have
> free bits). But I do not think this is an optimal solution either.
> 
>> Currently, the DMOP ABI isn't set in stone, so you have until code
>> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
>> consider which other DMops might potentially need to become continuable,
>> and take preventative action before the freeze.
> 
> I can only see set_mem_type and modified_mem need to become continuable, 
> and they does
> this quite elegantly now.
> 
>>
>> If we need to make similar changes once the ABI truely is frozen, there
>> are plenty of free bits in the end of the union which can be used
>> without breaking the ABI.
> 
> Another propose is to change the definition of 
> xen_dm_op_map_mem_type_to_ioreq_server,
> and extend the flags field from uint32_t to uint64_t and use the upper 
> bits to store the gfn.

Well, you introduce a brand new sub-op in patch 2. Why would
you even try to re-use part of some other field in such a
situation, when you can right away add an opaque 64-bit field
you require the caller to set to zero upon first call? The caller
not playing by this would - afaict - only harm the guest it controls.

Jan
Yu Zhang March 14, 2017, 7:42 a.m. UTC | #7
On 3/13/2017 7:32 PM, Jan Beulich wrote:
>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/11/2017 12:59 AM, Andrew Cooper wrote:
>>> On 08/03/17 15:33, Yu Zhang wrote:
>>>> --- a/xen/arch/x86/hvm/dm.c
>>>> +++ b/xen/arch/x86/hvm/dm.c
>>>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>>>        return 0;
>>>>    }
>>>>    
>>>> +#define DMOP_op_mask 0xff
>>>>    static int dm_op(domid_t domid,
>>>>                     unsigned int nr_bufs,
>>>>                     xen_dm_op_buf_t bufs[])
>>>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>>>        }
>>>>    
>>>>        rc = -EINVAL;
>>>> -    if ( op.pad )
>>>> -        goto out;
>>>>    
>>>> -    switch ( op.op )
>>>> +    switch ( op.op & DMOP_op_mask )
>>> Nack to changes like this.  HVMop continuations only existed in this
>>> form because we had to retrofit it to longterm-stable ABIs in the past,
>>> and there were literally no free bits anywhere else.
>> Thank you, Andrew. Frankly, I'm not surprised to see disagreement from
>> you and Jan
>> on this interface. :)
>>
>> Using the HVMop like continuation is a hard choice for me. I saw you
>> removed these
>> from the DMops, and I also agree that the new interface is much cleaner.
>>
>> I noticed there are other 2 DMops to continuable, the set_mem_type and
>> the modified_mem.
>> Both definitions of their structure have fields like first_gfn and nr
>> which can be updated to
>> be used in the hypercall continuation.
>> But for map_mem_type_to_ioreq_server, however we do not need a gfn
>> number exposed in
>> this interface(and I do not think exposing a gfn is correct), it is only
>> used when an ioreq server
>> detaches from p2m_ioreq_server and the p2m sweeping is not finished. So
>> I changed definition
>> of the xen_dm_op, removed the pad field and extend the size of op to 64
>> bit(so that we can have
>> free bits). But I do not think this is an optimal solution either.
>>
>>> Currently, the DMOP ABI isn't set in stone, so you have until code
>>> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
>>> consider which other DMops might potentially need to become continuable,
>>> and take preventative action before the freeze.
>> I can only see set_mem_type and modified_mem need to become continuable,
>> and they does
>> this quite elegantly now.
>>
>>> If we need to make similar changes once the ABI truely is frozen, there
>>> are plenty of free bits in the end of the union which can be used
>>> without breaking the ABI.
>> Another propose is to change the definition of
>> xen_dm_op_map_mem_type_to_ioreq_server,
>> and extend the flags field from uint32_t to uint64_t and use the upper
>> bits to store the gfn.
> Well, you introduce a brand new sub-op in patch 2. Why would
> you even try to re-use part of some other field in such a
> situation, when you can right away add an opaque 64-bit field
> you require the caller to set to zero upon first call? The caller
> not playing by this would - afaict - only harm the guest it controls.

Thanks, Jan.
So you mean change the definition of to 
xen_dm_op_map_mem_type_to_ioreq_server
to something like this?

struct xen_dm_op_map_mem_type_to_ioreq_server {
     ioservid_t id;      /* IN - ioreq server id */
     uint16_t type;      /* IN - memory type */
     uint32_t flags;     /* IN - types of accesses to be forwarded to the
                            ioreq server. flags with 0 means to unmap the
                            ioreq server */

     uint64_t opaque;    /* only used for hypercall continuation, should
                            be set to zero by the caller */
};

If so, is there any approach in hypervisor to differentiate the first 
call from the continued
hypercall? Do we need some form of check on this opaque? And yes, not 
playing by this
will only harm the guest the device model controls.

Thanks
Yu

> Jan
>
>
Jan Beulich March 14, 2017, 10:51 a.m. UTC | #8
>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
> So you mean change the definition of to 
> xen_dm_op_map_mem_type_to_ioreq_server
> to something like this?
> 
> struct xen_dm_op_map_mem_type_to_ioreq_server {
>      ioservid_t id;      /* IN - ioreq server id */
>      uint16_t type;      /* IN - memory type */
>      uint32_t flags;     /* IN - types of accesses to be forwarded to the
>                             ioreq server. flags with 0 means to unmap the
>                             ioreq server */
> 
>      uint64_t opaque;    /* only used for hypercall continuation, should
>                             be set to zero by the caller */
> };

Yes, with the field marked IN OUT and "should" replaced by "has to".

> If so, is there any approach in hypervisor to differentiate the first 
> call from the continued
> hypercall? Do we need some form of check on this opaque?

I don't see a need (other than - of course - the obvious upper
bound imposed here), due to ...

> And yes, not 
> playing by this
> will only harm the guest the device model controls.

... this.

Jan
Yu Zhang March 14, 2017, 12:22 p.m. UTC | #9
On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>> So you mean change the definition of to
>> xen_dm_op_map_mem_type_to_ioreq_server
>> to something like this?
>>
>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>       ioservid_t id;      /* IN - ioreq server id */
>>       uint16_t type;      /* IN - memory type */
>>       uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>                              ioreq server. flags with 0 means to unmap the
>>                              ioreq server */
>>
>>       uint64_t opaque;    /* only used for hypercall continuation, should
>>                              be set to zero by the caller */
>> };
> Yes, with the field marked IN OUT and "should" replaced by "has to".

Got it. BTW, I can define this opaque field in patch 2/5 - the 
introduction of this dm_op,
or add it in this patch 5/5 when we use it in the hypercall 
continuation. Which one do
you incline?

>> If so, is there any approach in hypervisor to differentiate the first
>> call from the continued
>> hypercall? Do we need some form of check on this opaque?
> I don't see a need (other than - of course - the obvious upper
> bound imposed here), due to ...
>
>> And yes, not
>> playing by this
>> will only harm the guest the device model controls.
> ... this.

OK. I'm fine with this too. :-)

Thanks
Yu

> Jan
>
>
Jan Beulich March 14, 2017, 1:12 p.m. UTC | #10
>>> On 14.03.17 at 13:22, <yu.c.zhang@linux.intel.com> wrote:
> On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>> So you mean change the definition of to
>>> xen_dm_op_map_mem_type_to_ioreq_server
>>> to something like this?
>>>
>>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>>       ioservid_t id;      /* IN - ioreq server id */
>>>       uint16_t type;      /* IN - memory type */
>>>       uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>                              ioreq server. flags with 0 means to unmap the
>>>                              ioreq server */
>>>
>>>       uint64_t opaque;    /* only used for hypercall continuation, should
>>>                              be set to zero by the caller */
>>> };
>> Yes, with the field marked IN OUT and "should" replaced by "has to".
> 
> Got it. BTW, I can define this opaque field in patch 2/5 - the 
> introduction of this dm_op,
> or add it in this patch 5/5 when we use it in the hypercall 
> continuation. Which one do
> you incline?

I'd prefer if you added it right away, i.e. in patch 2.

Jan
Yu Zhang March 14, 2017, 1:29 p.m. UTC | #11
On 3/14/2017 9:12 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 13:22, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>>> So you mean change the definition of to
>>>> xen_dm_op_map_mem_type_to_ioreq_server
>>>> to something like this?
>>>>
>>>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>>>        ioservid_t id;      /* IN - ioreq server id */
>>>>        uint16_t type;      /* IN - memory type */
>>>>        uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>>                               ioreq server. flags with 0 means to unmap the
>>>>                               ioreq server */
>>>>
>>>>        uint64_t opaque;    /* only used for hypercall continuation, should
>>>>                               be set to zero by the caller */
>>>> };
>>> Yes, with the field marked IN OUT and "should" replaced by "has to".
>> Got it. BTW, I can define this opaque field in patch 2/5 - the
>> introduction of this dm_op,
>> or add it in this patch 5/5 when we use it in the hypercall
>> continuation. Which one do
>> you incline?
> I'd prefer if you added it right away, i.e. in patch 2.

Got it. Thank you, Jan. :)

Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index f97478b..a92d5d7 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -288,6 +288,7 @@  static int inject_event(struct domain *d,
     return 0;
 }
 
+#define DMOP_op_mask 0xff
 static int dm_op(domid_t domid,
                  unsigned int nr_bufs,
                  xen_dm_op_buf_t bufs[])
@@ -315,10 +316,8 @@  static int dm_op(domid_t domid,
     }
 
     rc = -EINVAL;
-    if ( op.pad )
-        goto out;
 
-    switch ( op.op )
+    switch ( op.op & DMOP_op_mask )
     {
     case XEN_DMOP_create_ioreq_server:
     {
@@ -387,6 +386,10 @@  static int dm_op(domid_t domid,
     {
         const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
             &op.u.map_mem_type_to_ioreq_server;
+        unsigned long gfn_start = op.op & ~DMOP_op_mask;
+        unsigned long gfn_end;
+
+        const_op = false;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -396,8 +399,38 @@  static int dm_op(domid_t domid,
         if ( !hap_enabled(d) )
             break;
 
-        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
-                                              data->type, data->flags);
+        if ( gfn_start == 0 )
+            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                                  data->type, data->flags);
+        /*
+         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
+         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
+         */
+        if ( (gfn_start > 0) || (data->flags == 0 && rc == 0) )
+        {
+            struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+            while ( read_atomic(&p2m->ioreq.entry_count) &&
+                    gfn_start <= p2m->max_mapped_pfn )
+            {
+                gfn_end = gfn_start + DMOP_op_mask;
+
+                p2m_finish_type_change(d, gfn_start, gfn_end,
+                                       p2m_ioreq_server, p2m_ram_rw);
+
+                gfn_start = gfn_end + 1;
+
+                /* Check for continuation if it's not the last iteration. */
+                if ( gfn_start <= p2m->max_mapped_pfn &&
+                     hypercall_preempt_check() )
+                {
+                    rc = -ERESTART;
+                    op.op |= gfn_start;
+                    break;
+                }
+            }
+        }
+
         break;
     }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 94d7141..9a81f00 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1038,6 +1038,35 @@  void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+/* Synchronously modify the p2m type of a range of gfns from ot to nt. */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long start, unsigned long end,
+                            p2m_type_t ot, p2m_type_t nt)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    p2m_type_t t;
+    unsigned long gfn = start;
+
+    ASSERT(start <= end);
+    ASSERT(ot != nt);
+    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
+
+    p2m_lock(p2m);
+
+    end = min(end, p2m->max_mapped_pfn);
+    while ( gfn <= end )
+    {
+        get_gfn_query_unlocked(d, gfn, &t);
+
+        if ( t == ot )
+            p2m_change_type_one(d, gfn, t, nt);
+
+        gfn++;
+    }
+
+    p2m_unlock(p2m);
+}
+
 /*
  * Returns:
  *    0              for success
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 395f125..3eadd89 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -611,6 +611,11 @@  void p2m_change_type_range(struct domain *d,
 int p2m_change_type_one(struct domain *d, unsigned long gfn,
                         p2m_type_t ot, p2m_type_t nt);
 
+/* Synchronously change types across a range of p2m entries (start ... end) */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long start, unsigned long end,
+                            p2m_type_t ot, p2m_type_t nt);
+
 /* Report a change affecting memory types. */
 void p2m_memory_type_changed(struct domain *d);
 
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index c643b67..23b364b 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -343,8 +343,7 @@  struct xen_dm_op_map_mem_type_to_ioreq_server {
 };
 
 struct xen_dm_op {
-    uint32_t op;
-    uint32_t pad;
+    uint64_t op;
     union {
         struct xen_dm_op_create_ioreq_server create_ioreq_server;
         struct xen_dm_op_get_ioreq_server_info get_ioreq_server_info;