diff mbox series

[v1] Reset iomem's gfn to LIBXL_INVALID_GFN on reboot

Message ID 20191010135004.24226-1-al1img@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v1] Reset iomem's gfn to LIBXL_INVALID_GFN on reboot | expand

Commit Message

Oleksandr Grytsov Oct. 10, 2019, 1:50 p.m. UTC
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

During domain reboot its configuration is partially reused
to re-create a new domain, but iomem's GFN field for the
iomem is only restored for those memory ranges, which are
configured in form of [IOMEM_START,NUM_PAGES[@GFN], but not for
those in form of [IOMEM_START,NUM_PAGES], e.g. without GFN.
For the latter GFN is reset to 0, but while mapping ranges
to a domain during reboot there is a check that GFN treated
as valid if it is not equal to LIBXL_INVALID_GFN, thus making
Xen to map IOMEM_START to address 0 in the guest's address space.

Workaround it by resseting GFN to LIBXL_INVALID_GFN, so xl
can set proper values for mapping on reboot.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 tools/libxl/libxl_domain.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ian Jackson Oct. 11, 2019, 3:14 p.m. UTC | #1
Oleksandr Grytsov writes ("[PATCH v1] Reset iomem's gfn to LIBXL_INVALID_GFN on reboot"):
> During domain reboot its configuration is partially reused
> to re-create a new domain, but iomem's GFN field for the
> iomem is only restored for those memory ranges, which are
> configured in form of [IOMEM_START,NUM_PAGES[@GFN], but not for
> those in form of [IOMEM_START,NUM_PAGES], e.g. without GFN.
> For the latter GFN is reset to 0, but while mapping ranges
> to a domain during reboot there is a check that GFN treated
> as valid if it is not equal to LIBXL_INVALID_GFN, thus making
> Xen to map IOMEM_START to address 0 in the guest's address space.
> 
> Workaround it by resseting GFN to LIBXL_INVALID_GFN, so xl
> can set proper values for mapping on reboot.

Thanks for this patch.

I confess that I am not sure what is going on here.  Where is this
troublesome 0 coming from ?  I see that the default value for gfn in
the struct is 0 and looking for assignments before this patch, gfn is
defaulted from b_info->iomem[i].start, which is presumably non-0.

I suspect that your patch may be fixing this the wrong way.  I have
addressed this mail to various people who have touched this area of
code and hope they will be able to clarify.

BTW, please do ping this (and your other patches) by email, if the
conversation seems to stall.

Thanks,
Ian.

> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  tools/libxl/libxl_domain.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
> index 9d0eb5aed1..0ae16a5b12 100644
> --- a/tools/libxl/libxl_domain.c
> +++ b/tools/libxl/libxl_domain.c
> @@ -2120,6 +2120,15 @@ static void retrieve_domain_configuration_end(libxl__egc *egc,
>          }
>      }
>  
> +    /* reset IOMEM's GFN to initial value */
> +    {
> +        int i;
> +
> +        for (i = 0; i < d_config->b_info.num_iomem; i++)
> +            if (d_config->b_info.iomem[i].gfn == 0)
> +                d_config->b_info.iomem[i].gfn = LIBXL_INVALID_GFN;
> +    }
> +
>      /* Devices: disk, nic, vtpm, pcidev etc. */
>  
>      /* The MERGE macro implements following logic:
> -- 
> 2.17.1
>
Julien Grall Oct. 14, 2019, 9:28 a.m. UTC | #2
Hi Ian,

On 11/10/2019 16:14, Ian Jackson wrote:
> Oleksandr Grytsov writes ("[PATCH v1] Reset iomem's gfn to LIBXL_INVALID_GFN on reboot"):
>> During domain reboot its configuration is partially reused
>> to re-create a new domain, but iomem's GFN field for the
>> iomem is only restored for those memory ranges, which are
>> configured in form of [IOMEM_START,NUM_PAGES[@GFN], but not for
>> those in form of [IOMEM_START,NUM_PAGES], e.g. without GFN.
>> For the latter GFN is reset to 0, but while mapping ranges
>> to a domain during reboot there is a check that GFN treated
>> as valid if it is not equal to LIBXL_INVALID_GFN, thus making
>> Xen to map IOMEM_START to address 0 in the guest's address space.
>>
>> Workaround it by resseting GFN to LIBXL_INVALID_GFN, so xl
>> can set proper values for mapping on reboot.
> 
> Thanks for this patch.
> 
> I confess that I am not sure what is going on here.  Where is this
> troublesome 0 coming from ?  I see that the default value for gfn in
> the struct is 0 and looking for assignments before this patch, gfn is
> defaulted from b_info->iomem[i].start, which is presumably non-0.
> 
> I suspect that your patch may be fixing this the wrong way.  I have
> addressed this mail to various people who have touched this area of
> code and hope they will be able to clarify.

I found a thread from December 2017 related to the problem described here [1].

Looking at the thread, there were no conclusion on the root causes and some 
questions were left unanswered by the contributor (see [2]).

Oleksandr, could you look at the thread and see if you can provide more details 
what's going on? Answering back here would be fine.

> 
> BTW, please do ping this (and your other patches) by email, if the
> conversation seems to stall.
> 
> Thanks,
> Ian.
> 
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   tools/libxl/libxl_domain.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
>> index 9d0eb5aed1..0ae16a5b12 100644
>> --- a/tools/libxl/libxl_domain.c
>> +++ b/tools/libxl/libxl_domain.c
>> @@ -2120,6 +2120,15 @@ static void retrieve_domain_configuration_end(libxl__egc *egc,
>>           }
>>       }
>>   
>> +    /* reset IOMEM's GFN to initial value */
>> +    {
>> +        int i;
>> +
>> +        for (i = 0; i < d_config->b_info.num_iomem; i++)
>> +            if (d_config->b_info.iomem[i].gfn == 0)
>> +                d_config->b_info.iomem[i].gfn = LIBXL_INVALID_GFN;
>> +    }
>> +
>>       /* Devices: disk, nic, vtpm, pcidev etc. */
>>   
>>       /* The MERGE macro implements following logic:
>> -- 
>> 2.17.1
>>

Cheers,

[1] <ebf78aec-dcfd-72d9-dac2-06b29e4a66ae@gmail.com>
[2] <20180213122432.h4fh22ej4dfe7226@citrix.com>
Oleksandr Grytsov Oct. 16, 2019, 1:09 p.m. UTC | #3
On Mon, Oct 14, 2019 at 12:28 PM Julien Grall <julien.grall@arm.com> wrote:
>
> Hi Ian,
>
> On 11/10/2019 16:14, Ian Jackson wrote:
> > Oleksandr Grytsov writes ("[PATCH v1] Reset iomem's gfn to LIBXL_INVALID_GFN on reboot"):
> >> During domain reboot its configuration is partially reused
> >> to re-create a new domain, but iomem's GFN field for the
> >> iomem is only restored for those memory ranges, which are
> >> configured in form of [IOMEM_START,NUM_PAGES[@GFN], but not for
> >> those in form of [IOMEM_START,NUM_PAGES], e.g. without GFN.
> >> For the latter GFN is reset to 0, but while mapping ranges
> >> to a domain during reboot there is a check that GFN treated
> >> as valid if it is not equal to LIBXL_INVALID_GFN, thus making
> >> Xen to map IOMEM_START to address 0 in the guest's address space.
> >>
> >> Workaround it by resseting GFN to LIBXL_INVALID_GFN, so xl
> >> can set proper values for mapping on reboot.
> >
> > Thanks for this patch.
> >
> > I confess that I am not sure what is going on here.  Where is this
> > troublesome 0 coming from ?  I see that the default value for gfn in
> > the struct is 0 and looking for assignments before this patch, gfn is
> > defaulted from b_info->iomem[i].start, which is presumably non-0.
> >
> > I suspect that your patch may be fixing this the wrong way.  I have
> > addressed this mail to various people who have touched this area of
> > code and hope they will be able to clarify.
>
> I found a thread from December 2017 related to the problem described here [1].
>
> Looking at the thread, there were no conclusion on the root causes and some
> questions were left unanswered by the contributor (see [2]).
>
> Oleksandr, could you look at the thread and see if you can provide more details
> what's going on? Answering back here would be fine.
>
> >
> > BTW, please do ping this (and your other patches) by email, if the
> > conversation seems to stall.
> >
> > Thanks,
> > Ian.
> >
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >> ---
> >>   tools/libxl/libxl_domain.c | 9 +++++++++
> >>   1 file changed, 9 insertions(+)
> >>
> >> diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
> >> index 9d0eb5aed1..0ae16a5b12 100644
> >> --- a/tools/libxl/libxl_domain.c
> >> +++ b/tools/libxl/libxl_domain.c
> >> @@ -2120,6 +2120,15 @@ static void retrieve_domain_configuration_end(libxl__egc *egc,
> >>           }
> >>       }
> >>
> >> +    /* reset IOMEM's GFN to initial value */
> >> +    {
> >> +        int i;
> >> +
> >> +        for (i = 0; i < d_config->b_info.num_iomem; i++)
> >> +            if (d_config->b_info.iomem[i].gfn == 0)
> >> +                d_config->b_info.iomem[i].gfn = LIBXL_INVALID_GFN;
> >> +    }
> >> +
> >>       /* Devices: disk, nic, vtpm, pcidev etc. */
> >>
> >>       /* The MERGE macro implements following logic:
> >> --
> >> 2.17.1
> >>
>
> Cheers,
>
> [1] <ebf78aec-dcfd-72d9-dac2-06b29e4a66ae@gmail.com>
> [2] <20180213122432.h4fh22ej4dfe7226@citrix.com>
>
> --
> Julien Grall

Julien,

Thanks to point me out for this old thread. I completely forgot about it
(I haven't worked with xen since long time). I've performed additional
investigation
and found the root cause of the issue. It doesn't relate to iomem GFN directly.
The problem is in type from json parsing at place where libxl creates array of
struct.

For example, libxl_domain_config_from_json calls libxl_domain_config_init
which initializes all child structures and arrays. But then when libxl parses
json and creates the array of structure, it doesn't initialize array elements
properly (see libxl__domain_build_info_parse_json iomem parsing):

p->num_iomem = x->u.array->count;
p->iomem = libxl__calloc(NOGC, p->num_iomem, sizeof(*p->iomem));
if (!p->iomem && p->num_iomem != 0) {
    rc = -1;
    goto out;
}
for (i=0; (t=libxl__json_array_get(x,i)); i++) {
    rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
    if (rc)
       goto out;
}

libxl creates array element with calloc function, so all element
fields are initialized
with zero values. Even some of them have default value different from zero.
For these purpose dedicated init function should be called for each element.
Above example should be:

for (i=0; (t=libxl__json_array_get(x,i)); i++) {
    libxl_iomem_range_init(&p->iomem[i]);
    rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
    if (rc)
       goto out;
}

I've changes gentypes.py as following:

diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py
index 88e5c5f30e..92e28be469 100644
--- a/tools/libxl/gentypes.py
+++ b/tools/libxl/gentypes.py
@@ -454,6 +454,8 @@ def libxl_C_type_parse_json(ty, w, v, indent = "
 ", parent = None, discrimina
         s += "        goto out;\n"
         s += "    }\n"
         s += "    for (i=0; (t=libxl__json_array_get(x,i)); i++) {\n"
+        if ty.elem_type.init_fn is not None and
ty.elem_type.autogenerate_init_fn:
+            s += indent + "    "+"%s_init(&%s[i]);\n" %
(ty.elem_type.typename, v)
         s += libxl_C_type_parse_json(ty.elem_type, "t", v+"[i]",
                                      indent + "    ", parent)
         s += "    }\n"

I'm not sure is it right and complete fix.

Ian, could you review?

If the fix is ok, I will submit the patch.

Thanks.
Julien Grall Oct. 22, 2019, 6:14 p.m. UTC | #4
Hi Oleksandr,

Apologies for the late answer.

On 16/10/2019 14:09, Oleksandr Grytsov wrote:
> On Mon, Oct 14, 2019 at 12:28 PM Julien Grall <julien.grall@arm.com> wrote:
> Thanks to point me out for this old thread. I completely forgot about it
> (I haven't worked with xen since long time). I've performed additional
> investigation
> and found the root cause of the issue. It doesn't relate to iomem GFN directly.
> The problem is in type from json parsing at place where libxl creates array of
> struct.
> 
> For example, libxl_domain_config_from_json calls libxl_domain_config_init
> which initializes all child structures and arrays. But then when libxl parses
> json and creates the array of structure, it doesn't initialize array elements
> properly (see libxl__domain_build_info_parse_json iomem parsing):
> 
> p->num_iomem = x->u.array->count;
> p->iomem = libxl__calloc(NOGC, p->num_iomem, sizeof(*p->iomem));
> if (!p->iomem && p->num_iomem != 0) {
>      rc = -1;
>      goto out;
> }
> for (i=0; (t=libxl__json_array_get(x,i)); i++) {
>      rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
>      if (rc)
>         goto out;
> }
> 
> libxl creates array element with calloc function, so all element
> fields are initialized
> with zero values. Even some of them have default value different from zero.
> For these purpose dedicated init function should be called for each element.
> Above example should be:
> 
> for (i=0; (t=libxl__json_array_get(x,i)); i++) {
>      libxl_iomem_range_init(&p->iomem[i]);
>      rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
>      if (rc)
>         goto out;
> }

Not initializing the values is fine as long as the JSON describes all the fields 
of the structure.

The key point here is the GFN is not described in the JSON (see 
libxl_iomem_range_gen_json) if it is equal to LIBXL_INVALID_GFN. As the field is 
not described, it will be defaulted to 0.

> 
> I've changes gentypes.py as following:
> 
> diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py
> index 88e5c5f30e..92e28be469 100644
> --- a/tools/libxl/gentypes.py
> +++ b/tools/libxl/gentypes.py
> @@ -454,6 +454,8 @@ def libxl_C_type_parse_json(ty, w, v, indent = "
>   ", parent = None, discrimina
>           s += "        goto out;\n"
>           s += "    }\n"
>           s += "    for (i=0; (t=libxl__json_array_get(x,i)); i++) {\n"
> +        if ty.elem_type.init_fn is not None and
> ty.elem_type.autogenerate_init_fn:

My knowledge of libxl is quite limited. But I don't think this is correct, you 
want to call init_fn whether this has been autogenerated or not.

> +            s += indent + "    "+"%s_init(&%s[i]);\n" %
> (ty.elem_type.typename, v)

Looking at the other usage (like _libxl_C_type_init), init_fn is called with

             s += "%s(%s);\n" % (ty.init_fn, ty.pass_arg(v, parent is None))

I am also not entirely sure whether we should also cater the ty.init_val != None 
as well here.

>           s += libxl_C_type_parse_json(ty.elem_type, "t", v+"[i]",
>                                        indent + "    ", parent)
>           s += "    }\n"
> 
> I'm not sure is it right and complete fix.
> 
> Ian, could you review?
> 
> If the fix is ok, I will submit the patch.

IHMO, the idea is there. The code may need some modifications (see above). 
Please post a patch so we can go forward in the process to review it.

Cheers,
Oleksandr Grytsov Oct. 24, 2019, 2:10 p.m. UTC | #5
On Tue, Oct 22, 2019 at 9:14 PM Julien Grall <julien.grall@arm.com> wrote:
>
> Hi Oleksandr,
>
> Apologies for the late answer.
>
> On 16/10/2019 14:09, Oleksandr Grytsov wrote:
> > On Mon, Oct 14, 2019 at 12:28 PM Julien Grall <julien.grall@arm.com> wrote:
> > Thanks to point me out for this old thread. I completely forgot about it
> > (I haven't worked with xen since long time). I've performed additional
> > investigation
> > and found the root cause of the issue. It doesn't relate to iomem GFN directly.
> > The problem is in type from json parsing at place where libxl creates array of
> > struct.
> >
> > For example, libxl_domain_config_from_json calls libxl_domain_config_init
> > which initializes all child structures and arrays. But then when libxl parses
> > json and creates the array of structure, it doesn't initialize array elements
> > properly (see libxl__domain_build_info_parse_json iomem parsing):
> >
> > p->num_iomem = x->u.array->count;
> > p->iomem = libxl__calloc(NOGC, p->num_iomem, sizeof(*p->iomem));
> > if (!p->iomem && p->num_iomem != 0) {
> >      rc = -1;
> >      goto out;
> > }
> > for (i=0; (t=libxl__json_array_get(x,i)); i++) {
> >      rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
> >      if (rc)
> >         goto out;
> > }
> >
> > libxl creates array element with calloc function, so all element
> > fields are initialized
> > with zero values. Even some of them have default value different from zero.
> > For these purpose dedicated init function should be called for each element.
> > Above example should be:
> >
> > for (i=0; (t=libxl__json_array_get(x,i)); i++) {
> >      libxl_iomem_range_init(&p->iomem[i]);
> >      rc = libxl__iomem_range_parse_json(gc, t, &p->iomem[i]);
> >      if (rc)
> >         goto out;
> > }
>
> Not initializing the values is fine as long as the JSON describes all the fields
> of the structure.
>
> The key point here is the GFN is not described in the JSON (see
> libxl_iomem_range_gen_json) if it is equal to LIBXL_INVALID_GFN. As the field is
> not described, it will be defaulted to 0.
>

Yes. So, either  ..._gen_json should generate entries for default
values or ..._parse_json
should set missing entries to its default values. Second solution looks more
correct.

> >
> > I've changes gentypes.py as following:
> >
> > diff --git a/tools/libxl/gentypes.py b/tools/libxl/gentypes.py
> > index 88e5c5f30e..92e28be469 100644
> > --- a/tools/libxl/gentypes.py
> > +++ b/tools/libxl/gentypes.py
> > @@ -454,6 +454,8 @@ def libxl_C_type_parse_json(ty, w, v, indent = "
> >   ", parent = None, discrimina
> >           s += "        goto out;\n"
> >           s += "    }\n"
> >           s += "    for (i=0; (t=libxl__json_array_get(x,i)); i++) {\n"
> > +        if ty.elem_type.init_fn is not None and
> > ty.elem_type.autogenerate_init_fn:
>
> My knowledge of libxl is quite limited. But I don't think this is correct, you
> want to call init_fn whether this has been autogenerated or not.
>
> > +            s += indent + "    "+"%s_init(&%s[i]);\n" %
> > (ty.elem_type.typename, v)
>
> Looking at the other usage (like _libxl_C_type_init), init_fn is called with
>
>              s += "%s(%s);\n" % (ty.init_fn, ty.pass_arg(v, parent is None))
>
> I am also not entirely sure whether we should also cater the ty.init_val != None
> as well here.
>
> >           s += libxl_C_type_parse_json(ty.elem_type, "t", v+"[i]",
> >                                        indent + "    ", parent)
> >           s += "    }\n"
> >
> > I'm not sure is it right and complete fix.
> >
> > Ian, could you review?
> >
> > If the fix is ok, I will submit the patch.
>
> IHMO, the idea is there. The code may need some modifications (see above).
> Please post a patch so we can go forward in the process to review it.

Thanks. I will post the separate path.

>
> Cheers,
>
> --
> Julien Grall
diff mbox series

Patch

diff --git a/tools/libxl/libxl_domain.c b/tools/libxl/libxl_domain.c
index 9d0eb5aed1..0ae16a5b12 100644
--- a/tools/libxl/libxl_domain.c
+++ b/tools/libxl/libxl_domain.c
@@ -2120,6 +2120,15 @@  static void retrieve_domain_configuration_end(libxl__egc *egc,
         }
     }
 
+    /* reset IOMEM's GFN to initial value */
+    {
+        int i;
+
+        for (i = 0; i < d_config->b_info.num_iomem; i++)
+            if (d_config->b_info.iomem[i].gfn == 0)
+                d_config->b_info.iomem[i].gfn = LIBXL_INVALID_GFN;
+    }
+
     /* Devices: disk, nic, vtpm, pcidev etc. */
 
     /* The MERGE macro implements following logic: