diff mbox

[V2,2/2] svm: iommu: Only call guest_iommu_init() after initialized HVM domain

Message ID 573D4D7E.3080208@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suravee Suthikulpanit May 19, 2016, 5:22 a.m. UTC
+ Keir (since he added this code originally).

On 05/16/2016 03:19 AM, Paul Durrant wrote:
>> -----Original Message-----
>> >From:suravee.suthikulpanit@amd.com
>> >[mailto:suravee.suthikulpanit@amd.com]
>> >Sent: 13 May 2016 20:37
>> >To:xen-devel@lists.xen.org; George Dunlap;jbeulich@suse.com
>> >Cc: Paul Durrant; Suravee Suthikulpanit; Suravee Suthikulpanit
>> >Subject: [PATCH V2 2/2] svm: iommu: Only call guest_iommu_init() after
>> >initialized HVM domain
>> >
>> >From: Suravee Suthikulpanit<Suravee.Suthikulpanit@amd.com>
>> >
>> >The guest_iommu_init() is currently called by the following code path:
>> >
>> >     arch/x86/domain.c: arch_domain_create()
>> >       ]- drivers/passthrough/iommu.c: iommu_domain_init()
>> >         |- drivers/passthrough/amd/pci_amd_iommu.c:
>> >amd_iommu_domain_init();
>> >           |- drivers/passthrough/amd/iommu_guest.c: guest_iommu_init()
>> >
>> >At this point, the hvm_domain_initialised() has not been
>> >called. So register_mmio_handler(), in guest_iommu_init(), silently fails.
>> >This patch moves the call to guest_iommu_init/destroy() into
>> >the svm_domain_intialise/_destroy() instead.
>> >
> That seems wrong. You're taking a call that currently comes via a jump table, i.e. an abstraction layer, and calling it directly. Is it possible, instead, to move the call to iommu_domain_init() later in arch_domain_create()? It seems odd, to me at least, that it's done before hvm_domain_initialise() anyway.
>
>    Paul
>

Good point. I think I should be able to move iommu_domain_init() call to 
go after hvm_domain_initialise() as the following.


-----

This was added in the commit 66a882392272346ce1d0bc5a26568894f450a7c0,
and only says initialization cleanup and bugfix. I am not sure what bug 
was reported at the time. Anyone has an idea?

Suravee

Comments

Jan Beulich May 19, 2016, 6:03 a.m. UTC | #1
>>> Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> 05/19/16 7:22 AM >>>
>On 05/16/2016 03:19 AM, Paul Durrant wrote:
>> >From:suravee.suthikulpanit@amd.com
>> >Sent: 13 May 2016 20:37
>>> >The guest_iommu_init() is currently called by the following code path:
>>> >
>>> >     arch/x86/domain.c: arch_domain_create()
>>> >       ]- drivers/passthrough/iommu.c: iommu_domain_init()
>>> >         |- drivers/passthrough/amd/pci_amd_iommu.c:
>>> >amd_iommu_domain_init();
>>> >           |- drivers/passthrough/amd/iommu_guest.c: guest_iommu_init()
>>> >
>>> >At this point, the hvm_domain_initialised() has not been
>>> >called. So register_mmio_handler(), in guest_iommu_init(), silently fails.
>>> >This patch moves the call to guest_iommu_init/destroy() into
>>> >the svm_domain_intialise/_destroy() instead.
>>> >
>> That seems wrong. You're taking a call that currently comes via a jump table, i.e. an abstraction layer, and calling it directly. Is it possible, instead, to move the call to iommu_domain_init() later in arch_domain_create()? It seems odd, to me at least, that it's done before hvm_domain_initialise() anyway.
>
>Good point. I think I should be able to move iommu_domain_init() call to 
>go after hvm_domain_initialise() as the following.
>
>--- a/xen/arch/x86/domain.c
>+++ b/xen/arch/x86/domain.c
>@@ -625,24 +625,21 @@ int arch_domain_create(struct domain *d, unsigned 
>int domcr_flags,
>
>if ( (rc = init_domain_irq_mapping(d)) != 0 )
>goto fail;
>-
>-        if ( (rc = iommu_domain_init(d)) != 0 )
>-            goto fail;
>}
>spin_lock_init(&d->arch.e820_lock);
>
>if ( has_hvm_container_domain(d) )
>{
>if ( (rc = hvm_domain_initialise(d)) != 0 )
>-        {
>-            iommu_domain_destroy(d);
>goto fail;
>-        }
>}
      >else
>/* 64-bit PV guest by default. */
>d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>
>+    if ( !is_idle_domain(d) && (rc = iommu_domain_init(d)) != 0 )
>+        goto fail;

This would in the error case fail to undo what hvm_domain_initialise() did.
There was a fix very recently dealing with a similar issue. There really
shouldn't be anything that can fail after hvm_domain_initialise(). And I also
can't see why both of you think iommu_domain_init() has to come later -
that's a function affecting not just HVM guests.

Jan
Paul Durrant May 19, 2016, 7:52 a.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: 19 May 2016 07:04
> To: Suravee.Suthikulpanit@amd.com; Paul Durrant
> Cc: George Dunlap; xen-devel@lists.xen.org; Keir (Xen.org)
> Subject: Re: [PATCH V2 2/2] svm: iommu: Only call guest_iommu_init() after
> initialized HVM domain
> 
> >>> Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> 05/19/16 7:22
> AM >>>
> >On 05/16/2016 03:19 AM, Paul Durrant wrote:
> >> >From:suravee.suthikulpanit@amd.com
> >> >Sent: 13 May 2016 20:37
> >>> >The guest_iommu_init() is currently called by the following code path:
> >>> >
> >>> >     arch/x86/domain.c: arch_domain_create()
> >>> >       ]- drivers/passthrough/iommu.c: iommu_domain_init()
> >>> >         |- drivers/passthrough/amd/pci_amd_iommu.c:
> >>> >amd_iommu_domain_init();
> >>> >           |- drivers/passthrough/amd/iommu_guest.c: guest_iommu_init()
> >>> >
> >>> >At this point, the hvm_domain_initialised() has not been
> >>> >called. So register_mmio_handler(), in guest_iommu_init(), silently
> fails.
> >>> >This patch moves the call to guest_iommu_init/destroy() into
> >>> >the svm_domain_intialise/_destroy() instead.
> >>> >
> >> That seems wrong. You're taking a call that currently comes via a jump
> table, i.e. an abstraction layer, and calling it directly. Is it possible, instead, to
> move the call to iommu_domain_init() later in arch_domain_create()? It
> seems odd, to me at least, that it's done before hvm_domain_initialise()
> anyway.
> >
> >Good point. I think I should be able to move iommu_domain_init() call to
> >go after hvm_domain_initialise() as the following.
> >
> >--- a/xen/arch/x86/domain.c
> >+++ b/xen/arch/x86/domain.c
> >@@ -625,24 +625,21 @@ int arch_domain_create(struct domain *d,
> unsigned
> >int domcr_flags,
> >
> >if ( (rc = init_domain_irq_mapping(d)) != 0 )
> >goto fail;
> >-
> >-        if ( (rc = iommu_domain_init(d)) != 0 )
> >-            goto fail;
> >}
> >spin_lock_init(&d->arch.e820_lock);
> >
> >if ( has_hvm_container_domain(d) )
> >{
> >if ( (rc = hvm_domain_initialise(d)) != 0 )
> >-        {
> >-            iommu_domain_destroy(d);
> >goto fail;
> >-        }
> >}
>       >else
> >/* 64-bit PV guest by default. */
> >d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
> >
> >+    if ( !is_idle_domain(d) && (rc = iommu_domain_init(d)) != 0 )
> >+        goto fail;
> 
> This would in the error case fail to undo what hvm_domain_initialise() did.
> There was a fix very recently dealing with a similar issue. There really
> shouldn't be anything that can fail after hvm_domain_initialise().

Why is that? There are many failure paths within hvm_domain_initialise(). What's wrong with calling hvm_domain_destroy() to undo the whole thing? (Although I do notice that the io_bitmap would appear to leak in that case... but that looks like a bug to me).

> And I also
> can't see why both of you think iommu_domain_init() has to come later -
> that's a function affecting not just HVM guests.
> 

Yes, I realise that. But the problem is that, in the HVM case, it calls functions that make use of infrastructure that's initialized by hvm_domain_initialise().

  Paul

> Jan
Jan Beulich May 19, 2016, 8:59 a.m. UTC | #3
>>> On 19.05.16 at 09:52, <Paul.Durrant@citrix.com> wrote:
>> >--- a/xen/arch/x86/domain.c
>> >+++ b/xen/arch/x86/domain.c
>> >@@ -625,24 +625,21 @@ int arch_domain_create(struct domain *d,
>> unsigned
>> >int domcr_flags,
>> >
>> >if ( (rc = init_domain_irq_mapping(d)) != 0 )
>> >goto fail;
>> >-
>> >-        if ( (rc = iommu_domain_init(d)) != 0 )
>> >-            goto fail;
>> >}
>> >spin_lock_init(&d->arch.e820_lock);
>> >
>> >if ( has_hvm_container_domain(d) )
>> >{
>> >if ( (rc = hvm_domain_initialise(d)) != 0 )
>> >-        {
>> >-            iommu_domain_destroy(d);
>> >goto fail;
>> >-        }
>> >}
>>       >else
>> >/* 64-bit PV guest by default. */
>> >d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>> >
>> >+    if ( !is_idle_domain(d) && (rc = iommu_domain_init(d)) != 0 )
>> >+        goto fail;
>> 
>> This would in the error case fail to undo what hvm_domain_initialise() did.
>> There was a fix very recently dealing with a similar issue. There really
>> shouldn't be anything that can fail after hvm_domain_initialise().
> 
> Why is that? There are many failure paths within hvm_domain_initialise(). 
> What's wrong with calling hvm_domain_destroy() to undo the whole thing? 

Well, I simply didn't check whether hvm_domain_destroy() would be
suitable to be called in that case, but yes, it looks like it would be.

> (Although I do notice that the io_bitmap would appear to leak in that case... 
> but that looks like a bug to me).

That's a hardware domain only aspect, and failure to construct
the hardware domain would be fatal to the system anyway.

Jan
Paul Durrant May 19, 2016, 9:06 a.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 19 May 2016 10:00
> To: Paul Durrant
> Cc: Suravee.Suthikulpanit@amd.com; George Dunlap; xen-
> devel@lists.xen.org; Keir (Xen.org)
> Subject: RE: [PATCH V2 2/2] svm: iommu: Only call guest_iommu_init() after
> initialized HVM domain
> 
> >>> On 19.05.16 at 09:52, <Paul.Durrant@citrix.com> wrote:
> >> >--- a/xen/arch/x86/domain.c
> >> >+++ b/xen/arch/x86/domain.c
> >> >@@ -625,24 +625,21 @@ int arch_domain_create(struct domain *d,
> >> unsigned
> >> >int domcr_flags,
> >> >
> >> >if ( (rc = init_domain_irq_mapping(d)) != 0 )
> >> >goto fail;
> >> >-
> >> >-        if ( (rc = iommu_domain_init(d)) != 0 )
> >> >-            goto fail;
> >> >}
> >> >spin_lock_init(&d->arch.e820_lock);
> >> >
> >> >if ( has_hvm_container_domain(d) )
> >> >{
> >> >if ( (rc = hvm_domain_initialise(d)) != 0 )
> >> >-        {
> >> >-            iommu_domain_destroy(d);
> >> >goto fail;
> >> >-        }
> >> >}
> >>       >else
> >> >/* 64-bit PV guest by default. */
> >> >d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
> >> >
> >> >+    if ( !is_idle_domain(d) && (rc = iommu_domain_init(d)) != 0 )
> >> >+        goto fail;
> >>
> >> This would in the error case fail to undo what hvm_domain_initialise() did.
> >> There was a fix very recently dealing with a similar issue. There really
> >> shouldn't be anything that can fail after hvm_domain_initialise().
> >
> > Why is that? There are many failure paths within hvm_domain_initialise().
> > What's wrong with calling hvm_domain_destroy() to undo the whole
> thing?
> 
> Well, I simply didn't check whether hvm_domain_destroy() would be
> suitable to be called in that case, but yes, it looks like it would be.
> 
> > (Although I do notice that the io_bitmap would appear to leak in that
> case...
> > but that looks like a bug to me).
> 
> That's a hardware domain only aspect, and failure to construct
> the hardware domain would be fatal to the system anyway.
>

Ok. No problem then.

  Paul

 > Jan
diff mbox

Patch

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9d43f7b..ac289b6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -625,24 +625,21 @@  int arch_domain_create(struct domain *d, unsigned 
int domcr_flags,

          if ( (rc = init_domain_irq_mapping(d)) != 0 )
              goto fail;
-
-        if ( (rc = iommu_domain_init(d)) != 0 )
-            goto fail;
      }
      spin_lock_init(&d->arch.e820_lock);

      if ( has_hvm_container_domain(d) )
      {
          if ( (rc = hvm_domain_initialise(d)) != 0 )
-        {
-            iommu_domain_destroy(d);
              goto fail;
-        }
      }
      else
          /* 64-bit PV guest by default. */
          d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;

+    if ( !is_idle_domain(d) && (rc = iommu_domain_init(d)) != 0 )
+        goto fail;
+
      if ( (rc = psr_domain_init(d)) != 0 )
          goto fail;