diff mbox

[v8,08/24] x86: refactor psr: set value: implement framework.

Message ID 1487148579-7243-9-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun Feb. 15, 2017, 8:49 a.m. UTC
As set value flow is the most complicated one in psr, it will be
divided to some patches to make things clearer. This patch
implements the set value framework to show a whole picture firstly.

It also changes domctl interface to make it more general.

To make the set value flow be general and can support multiple features
at same time, it includes below steps:
1. Get COS ID of current domain using.
2. Assemble a value array to store all features current value
   in it and replace the current value of the feature which is
   being set to the new input value.
3. Find if there is already a COS ID on which all features'
   values are same as the array. Then, we can reuse this COS
   ID.
4. If fail to find, we need pick an available COS ID. Only COS ID which ref
   is 0 or 1 can be picked.
5. Write all features MSRs according to the COS ID.
6. Update ref according to COS ID.
7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
   can know which COS the domain is using on the socket.

So, some functions are abstracted and the callback functions will be
implemented in next patches.

Here is an example to understand the process. The CPU supports
two featuers, e.g. L3 CAT and L2 CAT. user wants to set L3 CAT
of Dom1 to 0x1ff.
1. Get the old_cos of Dom1 which is 0. L3 CAT is the first
element of feature list. The COS registers values are below at
this time.
        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | ...   | ...   | ... |
        -------------------------------
L2 CAT  | 0xff  | ...   | ...   | ... |
        -------------------------------

2. Assemble The value array to be:
val[0]: 0x1ff
val[1]: 0xff

3. It cannot find a matching COS.

4. Allocate COS 1 to store the value set.

5. Write the COS 1 registers. The COS registers values are
changed to below now.
        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
        -------------------------------
L2 CAT  | 0xff  | 0xff  | ...   | ... |
        -------------------------------

6. The ref[1] is increased to 1 because Dom1 is using it now.

7. Save 1 to Dom1's psr_cos_ids[socket].

Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
of Dom2 is 0 too. Repeat above flow.

The val array assembled is:
val[0]: 0x1ff
val[1]: 0xff

So, it can find a matching COS, COS 1. Then, it can reuse COS 1
for Dom2.

The ref[1] is increased to 2 now because both Dom1 and Dom2 are
using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 xen/arch/x86/domctl.c     |  18 ++---
 xen/arch/x86/psr.c        | 202 +++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/psr.h |   4 +-
 3 files changed, 210 insertions(+), 14 deletions(-)

Comments

Wei Liu Feb. 26, 2017, 5:41 p.m. UTC | #1
On Wed, Feb 15, 2017 at 04:49:23PM +0800, Yi Sun wrote:
[...]
> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint64_t val, enum cbm_type type)

IMHO it would be far better to use goto style error handling in such a
complex function. You can avoid missing one of the exit paths when
refactoring this function later.

> +{
> +    unsigned int old_cos;
> +    int cos, ret;
> +    unsigned int *ref;
> +    uint64_t *val_array;
> +    struct psr_socket_info *info = get_socket_info(socket);
> +    uint32_t array_len;
> +    enum psr_feat_type feat_type;
> +
> +    if ( IS_ERR(info) )
> +        return PTR_ERR(info);
> +
> +    feat_type = psr_cbm_type_to_feat_type(type);
> +    if ( !test_bit(feat_type, &info->feat_mask) )
> +        return -ENOENT;
> +
> +    /*
> +     * Step 0:
> +     * old_cos means the COS ID current domain is using. By default, it is 0.
> +     *
> +     * For every COS ID, there is a reference count to record how many domains
> +     * are using the COS register corresponding to this COS ID.
> +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> +     * - If ref[old_cos] is 1, that means this COS is only used by current
> +     *   domain.
> +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> +     *   this COS.
> +     */
> +    old_cos = d->arch.psr_cos_ids[socket];
> +    if ( old_cos > MAX_COS_REG_CNT )

How could this happen? This function is the setter of cos, it is a bug
if it ever sets a value larger than MAX_COS_REG_CNT.

> +        return -EOVERFLOW;
> +
> +    ref = info->cos_ref;
> +
> +    /*
> +     * Step 1:
> +     * Assemle a value array to store all featues cos_reg_val[old_cos].
> +     * And, set the input val into array according to the feature's
> +     * position in array.
> +     */
> +    array_len = get_cos_num(info);
> +    val_array = xzalloc_array(uint64_t, array_len);
> +    if ( !val_array )
> +        return -ENOMEM;
> +
> +    if ( (ret = assemble_val_array(val_array, array_len, info, old_cos)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    if ( (ret = set_new_val_to_array(val_array, array_len, info,
> +                                     feat_type, type, val)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    /*
> +     * Lock here to make sure the ref is not changed during find and
> +     * write process.
> +     */
> +    spin_lock(&info->ref_lock);
> +
> +    /*
> +     * Step 2:
> +     * Try to find if there is already a COS ID on which all features' values
> +     * are same as the array. Then, we can reuse this COS ID.
> +     */
> +    cos = find_cos(val_array, array_len, feat_type, info);
> +    if ( cos >= 0 )
> +    {
> +        if ( cos == old_cos )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return 0;
> +        }
> +    }
> +    else
> +    {
> +        /*
> +         * Step 3:
> +         * If fail to find, we need allocate a new COS ID.
> +         * If multiple domains are using same COS ID, its ref is more
> +         * than 1. That means we cannot free this COS to make current domain
> +         * use it. Because other domains are using the value saved in the COS.
> +         * Unless the ref is changed to 1 (mean only current domain is using
> +         * it), we cannot allocate the COS ID to current domain.
> +         * So, only the COS ID which ref is 1 or 0 can be allocated.
> +         */
> +        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
> +        if ( cos < 0 )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return cos;
> +        }
> +
> +        /*
> +         * Step 4:
> +         * Write all features MSRs according to the COS ID.
> +         */
> +        ret = write_psr_msr(socket, cos, val_array);
> +        if ( ret )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return ret;
> +        }
> +    }
> +
> +    /*
> +     * Step 5:
> +     * Update ref according to COS ID.
> +     */
> +    ref[cos]++;
> +    ASSERT(ref[cos] || cos == 0);
> +    ref[old_cos]--;
> +    spin_unlock(&info->ref_lock);
> +
> +    /*
> +     * Step 6:
> +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> +     * which COS the domain is using on the socket. One domain can only use
> +     * one COS ID at same time on each socket.
> +     */
> +    d->arch.psr_cos_ids[socket] = cos;
> +    xfree(val_array);
> +
> +    return 0;
> +}
> +
>  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>  static void psr_free_cos(struct domain *d)
>  {
> -    if( !d->arch.psr_cos_ids )
> +    unsigned int socket, cos;
> +
> +    if ( !d->arch.psr_cos_ids )
>          return;
>  
> +    /* Domain is free so its cos_ref should be decreased. */
> +    for ( socket = 0; socket < nr_sockets; socket++ )
> +    {
> +        struct psr_socket_info *info;
> +
> +        /* cos 0 is default one which does not need be handled. */
> +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )

Break this into two lines. The assignment doesn't have to happen within
if().

Wei.
Yi Sun Feb. 27, 2017, 7:06 a.m. UTC | #2
On 17-02-26 17:41:43, Wei Liu wrote:
> On Wed, Feb 15, 2017 at 04:49:23PM +0800, Yi Sun wrote:
> [...]
> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint64_t val, enum cbm_type type)
> 
> IMHO it would be far better to use goto style error handling in such a
> complex function. You can avoid missing one of the exit paths when
> refactoring this function later.
> 
Ok, I will consider to use goto for error handling.

> > +{
> > +    unsigned int old_cos;
> > +    int cos, ret;
> > +    unsigned int *ref;
> > +    uint64_t *val_array;
> > +    struct psr_socket_info *info = get_socket_info(socket);
> > +    uint32_t array_len;
> > +    enum psr_feat_type feat_type;
> > +
> > +    if ( IS_ERR(info) )
> > +        return PTR_ERR(info);
> > +
> > +    feat_type = psr_cbm_type_to_feat_type(type);
> > +    if ( !test_bit(feat_type, &info->feat_mask) )
> > +        return -ENOENT;
> > +
> > +    /*
> > +     * Step 0:
> > +     * old_cos means the COS ID current domain is using. By default, it is 0.
> > +     *
> > +     * For every COS ID, there is a reference count to record how many domains
> > +     * are using the COS register corresponding to this COS ID.
> > +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> > +     * - If ref[old_cos] is 1, that means this COS is only used by current
> > +     *   domain.
> > +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> > +     *   this COS.
> > +     */
> > +    old_cos = d->arch.psr_cos_ids[socket];
> > +    if ( old_cos > MAX_COS_REG_CNT )
> 
> How could this happen? This function is the setter of cos, it is a bug
> if it ever sets a value larger than MAX_COS_REG_CNT.
> 
You are right. This should not happen. This check is just a protection. If you
think it is not necessary, I will remove it.

> > +        return -EOVERFLOW;
> > +
> > +    ref = info->cos_ref;
> > +
> > +    /*
> > +     * Step 1:
> > +     * Assemle a value array to store all featues cos_reg_val[old_cos].
> > +     * And, set the input val into array according to the feature's
> > +     * position in array.
> > +     */
> > +    array_len = get_cos_num(info);
> > +    val_array = xzalloc_array(uint64_t, array_len);
> > +    if ( !val_array )
> > +        return -ENOMEM;
> > +
> > +    if ( (ret = assemble_val_array(val_array, array_len, info, old_cos)) != 0 )
> > +    {
> > +        xfree(val_array);
> > +        return ret;
> > +    }
> > +
> > +    if ( (ret = set_new_val_to_array(val_array, array_len, info,
> > +                                     feat_type, type, val)) != 0 )
> > +    {
> > +        xfree(val_array);
> > +        return ret;
> > +    }
> > +
> > +    /*
> > +     * Lock here to make sure the ref is not changed during find and
> > +     * write process.
> > +     */
> > +    spin_lock(&info->ref_lock);
> > +
> > +    /*
> > +     * Step 2:
> > +     * Try to find if there is already a COS ID on which all features' values
> > +     * are same as the array. Then, we can reuse this COS ID.
> > +     */
> > +    cos = find_cos(val_array, array_len, feat_type, info);
> > +    if ( cos >= 0 )
> > +    {
> > +        if ( cos == old_cos )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return 0;
> > +        }
> > +    }
> > +    else
> > +    {
> > +        /*
> > +         * Step 3:
> > +         * If fail to find, we need allocate a new COS ID.
> > +         * If multiple domains are using same COS ID, its ref is more
> > +         * than 1. That means we cannot free this COS to make current domain
> > +         * use it. Because other domains are using the value saved in the COS.
> > +         * Unless the ref is changed to 1 (mean only current domain is using
> > +         * it), we cannot allocate the COS ID to current domain.
> > +         * So, only the COS ID which ref is 1 or 0 can be allocated.
> > +         */
> > +        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
> > +        if ( cos < 0 )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return cos;
> > +        }
> > +
> > +        /*
> > +         * Step 4:
> > +         * Write all features MSRs according to the COS ID.
> > +         */
> > +        ret = write_psr_msr(socket, cos, val_array);
> > +        if ( ret )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return ret;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * Step 5:
> > +     * Update ref according to COS ID.
> > +     */
> > +    ref[cos]++;
> > +    ASSERT(ref[cos] || cos == 0);
> > +    ref[old_cos]--;
> > +    spin_unlock(&info->ref_lock);
> > +
> > +    /*
> > +     * Step 6:
> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> > +     * which COS the domain is using on the socket. One domain can only use
> > +     * one COS ID at same time on each socket.
> > +     */
> > +    d->arch.psr_cos_ids[socket] = cos;
> > +    xfree(val_array);
> > +
> > +    return 0;
> > +}
> > +
> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
> >  static void psr_free_cos(struct domain *d)
> >  {
> > -    if( !d->arch.psr_cos_ids )
> > +    unsigned int socket, cos;
> > +
> > +    if ( !d->arch.psr_cos_ids )
> >          return;
> >  
> > +    /* Domain is free so its cos_ref should be decreased. */
> > +    for ( socket = 0; socket < nr_sockets; socket++ )
> > +    {
> > +        struct psr_socket_info *info;
> > +
> > +        /* cos 0 is default one which does not need be handled. */
> > +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> 
> Break this into two lines. The assignment doesn't have to happen within
> if().
> 
Ok, thanks!

> Wei.
Jan Beulich Feb. 27, 2017, 10:55 a.m. UTC | #3
>>> Yi Sun <yi.y.sun@linux.intel.com> 02/27/17 8:06 AM >>>
>On 17-02-26 17:41:43, Wei Liu wrote:
>> On Wed, Feb 15, 2017 at 04:49:23PM +0800, Yi Sun wrote:
>> > +    /*
>> > +     * Step 0:
>> > +     * old_cos means the COS ID current domain is using. By default, it is 0.
>> > +     *
>> > +     * For every COS ID, there is a reference count to record how many domains
>> > +     * are using the COS register corresponding to this COS ID.
>> > +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
>> > +     * - If ref[old_cos] is 1, that means this COS is only used by current
>> > +     *   domain.
>> > +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
>> > +     *   this COS.
>> > +     */
>> > +    old_cos = d->arch.psr_cos_ids[socket];
>> > +    if ( old_cos > MAX_COS_REG_CNT )
>> 
>> How could this happen? This function is the setter of cos, it is a bug
>> if it ever sets a value larger than MAX_COS_REG_CNT.
>> 
>You are right. This should not happen. This check is just a protection. If you
>think it is not necessary, I will remove it.

In which case  - make in an ASSERT() instead of an if().

Jan
Roger Pau Monné Feb. 28, 2017, 1:58 p.m. UTC | #4
On Wed, Feb 15, 2017 at 04:49:23PM +0800, Yi Sun wrote:
> As set value flow is the most complicated one in psr, it will be
> divided to some patches to make things clearer. This patch
> implements the set value framework to show a whole picture firstly.
> 
> It also changes domctl interface to make it more general.
> 
> To make the set value flow be general and can support multiple features
> at same time, it includes below steps:
> 1. Get COS ID of current domain using.
> 2. Assemble a value array to store all features current value
>    in it and replace the current value of the feature which is
>    being set to the new input value.
> 3. Find if there is already a COS ID on which all features'
>    values are same as the array. Then, we can reuse this COS
>    ID.
> 4. If fail to find, we need pick an available COS ID. Only COS ID which ref
>    is 0 or 1 can be picked.
> 5. Write all features MSRs according to the COS ID.
> 6. Update ref according to COS ID.
> 7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
>    can know which COS the domain is using on the socket.
> 
> So, some functions are abstracted and the callback functions will be
> implemented in next patches.
> 
> Here is an example to understand the process. The CPU supports
> two featuers, e.g. L3 CAT and L2 CAT. user wants to set L3 CAT
> of Dom1 to 0x1ff.
> 1. Get the old_cos of Dom1 which is 0. L3 CAT is the first
> element of feature list. The COS registers values are below at
> this time.
>         -------------------------------
>         | COS 0 | COS 1 | COS 2 | ... |
>         -------------------------------
> L3 CAT  | 0x7ff | ...   | ...   | ... |
>         -------------------------------
> L2 CAT  | 0xff  | ...   | ...   | ... |
>         -------------------------------
> 
> 2. Assemble The value array to be:
> val[0]: 0x1ff
> val[1]: 0xff
> 
> 3. It cannot find a matching COS.
> 
> 4. Allocate COS 1 to store the value set.
> 
> 5. Write the COS 1 registers. The COS registers values are
> changed to below now.
>         -------------------------------
>         | COS 0 | COS 1 | COS 2 | ... |
>         -------------------------------
> L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
>         -------------------------------
> L2 CAT  | 0xff  | 0xff  | ...   | ... |
>         -------------------------------
> 
> 6. The ref[1] is increased to 1 because Dom1 is using it now.
> 
> 7. Save 1 to Dom1's psr_cos_ids[socket].
> 
> Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
> of Dom2 is 0 too. Repeat above flow.
> 
> The val array assembled is:
> val[0]: 0x1ff
> val[1]: 0xff
> 
> So, it can find a matching COS, COS 1. Then, it can reuse COS 1
> for Dom2.
> 
> The ref[1] is increased to 2 now because both Dom1 and Dom2 are
> using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].
> 
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
>  xen/arch/x86/domctl.c     |  18 ++---
>  xen/arch/x86/psr.c        | 202 +++++++++++++++++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/psr.h |   4 +-
>  3 files changed, 210 insertions(+), 14 deletions(-)
[...]
> diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
> index c1afd36..d414b5e 100644
> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -546,18 +546,214 @@ int psr_get_val(struct domain *d, unsigned int socket,
>      return psr_get(socket, type, NULL, 0, d, val);
>  }
>  
> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t cbm, enum cbm_type type)
> +/* Set value functions */
> +static unsigned int get_cos_num(const struct psr_socket_info *info)
>  {
>      return 0;
>  }
>  
> +static int assemble_val_array(uint64_t *val,
> +                              uint32_t array_len,
> +                              const struct psr_socket_info *info,
> +                              unsigned int old_cos)
> +{
> +    return -EINVAL;
> +}
> +
> +static int set_new_val_to_array(uint64_t *val,
> +                                uint32_t array_len,
> +                                const struct psr_socket_info *info,
> +                                enum psr_feat_type feat_type,
> +                                enum cbm_type type,
> +                                uint64_t m)
> +{
> +    return -EINVAL;
> +}
> +
> +static int find_cos(const uint64_t *val, uint32_t array_len,
> +                    enum psr_feat_type feat_type,
> +                    const struct psr_socket_info *info)
> +{
    ASSERT(spin_is_locked(info->ref_lock));
> +    return -ENOENT;
> +}
> +
> +static int pick_avail_cos(const struct psr_socket_info *info,
> +                          const uint64_t *val, uint32_t array_len,
> +                          unsigned int old_cos,
> +                          enum psr_feat_type feat_type)
> +{
    ASSERT(spin_is_locked(info->ref_lock));
> +    return -ENOENT;
> +}
> +
> +static int write_psr_msr(unsigned int socket, unsigned int cos,
> +                         const uint64_t *val)
> +{
    ASSERT(spin_is_locked(info->ref_lock));
> +    return -ENOENT;
> +}
> +
> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint64_t val, enum cbm_type type)
> +{
> +    unsigned int old_cos;
> +    int cos, ret;
> +    unsigned int *ref;
> +    uint64_t *val_array;
> +    struct psr_socket_info *info = get_socket_info(socket);
> +    uint32_t array_len;
> +    enum psr_feat_type feat_type;
> +
> +    if ( IS_ERR(info) )
> +        return PTR_ERR(info);
> +
> +    feat_type = psr_cbm_type_to_feat_type(type);
> +    if ( !test_bit(feat_type, &info->feat_mask) )
> +        return -ENOENT;
> +
> +    /*
> +     * Step 0:
> +     * old_cos means the COS ID current domain is using. By default, it is 0.
> +     *
> +     * For every COS ID, there is a reference count to record how many domains
> +     * are using the COS register corresponding to this COS ID.
> +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> +     * - If ref[old_cos] is 1, that means this COS is only used by current
> +     *   domain.
> +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> +     *   this COS.
> +     */
> +    old_cos = d->arch.psr_cos_ids[socket];
> +    if ( old_cos > MAX_COS_REG_CNT )
> +        return -EOVERFLOW;
> +
> +    ref = info->cos_ref;
> +
> +    /*
> +     * Step 1:
> +     * Assemle a value array to store all featues cos_reg_val[old_cos].
> +     * And, set the input val into array according to the feature's
> +     * position in array.
> +     */
> +    array_len = get_cos_num(info);
> +    val_array = xzalloc_array(uint64_t, array_len);
> +    if ( !val_array )
> +        return -ENOMEM;
> +
> +    if ( (ret = assemble_val_array(val_array, array_len, info, old_cos)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    if ( (ret = set_new_val_to_array(val_array, array_len, info,
> +                                     feat_type, type, val)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    /*
> +     * Lock here to make sure the ref is not changed during find and
> +     * write process.
> +     */
> +    spin_lock(&info->ref_lock);
> +
> +    /*
> +     * Step 2:
> +     * Try to find if there is already a COS ID on which all features' values
> +     * are same as the array. Then, we can reuse this COS ID.
> +     */
> +    cos = find_cos(val_array, array_len, feat_type, info);
> +    if ( cos >= 0 )
> +    {
> +        if ( cos == old_cos )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return 0;
> +        }
> +    }
> +    else
> +    {
> +        /*
> +         * Step 3:
> +         * If fail to find, we need allocate a new COS ID.
> +         * If multiple domains are using same COS ID, its ref is more
> +         * than 1. That means we cannot free this COS to make current domain
> +         * use it. Because other domains are using the value saved in the COS.
> +         * Unless the ref is changed to 1 (mean only current domain is using
> +         * it), we cannot allocate the COS ID to current domain.
> +         * So, only the COS ID which ref is 1 or 0 can be allocated.
> +         */
> +        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
> +        if ( cos < 0 )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return cos;
> +        }
> +
> +        /*
> +         * Step 4:
> +         * Write all features MSRs according to the COS ID.
> +         */
> +        ret = write_psr_msr(socket, cos, val_array);
> +        if ( ret )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return ret;
> +        }
> +    }
> +
> +    /*
> +     * Step 5:
> +     * Update ref according to COS ID.
> +     */
> +    ref[cos]++;
> +    ASSERT(ref[cos] || cos == 0);
> +    ref[old_cos]--;

If cos == 0 you can overflow ref[0].

> +    spin_unlock(&info->ref_lock);
> +
> +    /*
> +     * Step 6:
> +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> +     * which COS the domain is using on the socket. One domain can only use
> +     * one COS ID at same time on each socket.
> +     */
> +    d->arch.psr_cos_ids[socket] = cos;
> +    xfree(val_array);
> +
> +    return 0;
> +}
> +
>  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>  static void psr_free_cos(struct domain *d)
>  {
> -    if( !d->arch.psr_cos_ids )
> +    unsigned int socket, cos;

An ASSERT that the domain is locked would be better than a comment.

> +    if ( !d->arch.psr_cos_ids )
>          return;
>  
> +    /* Domain is free so its cos_ref should be decreased. */
> +    for ( socket = 0; socket < nr_sockets; socket++ )
> +    {
> +        struct psr_socket_info *info;
> +
> +        /* cos 0 is default one which does not need be handled. */
> +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> +            continue;
> +
> +        /*
> +         * If domain uses other cos ids, all corresponding refs must have been
> +         * increased 1 for this domain. So, we need decrease them.
> +         */
> +        info = socket_info + socket;
> +        ASSERT(info->cos_ref[cos] || cos == 0);
> +        spin_lock(&info->ref_lock);
> +        info->cos_ref[cos]--;

If cos == 0, it is possible to reach this with info->cos_ref[cos] == 0, in
which case you are overflowing it?

> +        spin_unlock(&info->ref_lock);
> +    }
> +
>      xfree(d->arch.psr_cos_ids);
>      d->arch.psr_cos_ids = NULL;
>  }
> diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
> index 569e7df..fde7882 100644
> --- a/xen/include/asm-x86/psr.h
> +++ b/xen/include/asm-x86/psr.h
> @@ -72,8 +72,8 @@ int psr_get_info(unsigned int socket, enum cbm_type type,
>                   uint32_t data[], unsigned int array_len);
>  int psr_get_val(struct domain *d, unsigned int socket,
>                  uint64_t *val, enum cbm_type type);
> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t cbm, enum cbm_type type);
> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint64_t val, enum cbm_type type);
>  
>  int psr_domain_init(struct domain *d);
>  void psr_domain_free(struct domain *d);
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Yi Sun March 1, 2017, 6:23 a.m. UTC | #5
On 17-02-28 13:58:55, Roger Pau Monn� wrote:
> > +static int find_cos(const uint64_t *val, uint32_t array_len,
> > +                    enum psr_feat_type feat_type,
> > +                    const struct psr_socket_info *info)
> > +{
>     ASSERT(spin_is_locked(info->ref_lock));
> > +    return -ENOENT;
> > +}
> > +
> > +static int pick_avail_cos(const struct psr_socket_info *info,
> > +                          const uint64_t *val, uint32_t array_len,
> > +                          unsigned int old_cos,
> > +                          enum psr_feat_type feat_type)
> > +{
>     ASSERT(spin_is_locked(info->ref_lock));
> > +    return -ENOENT;
> > +}
> > +
> > +static int write_psr_msr(unsigned int socket, unsigned int cos,
> > +                         const uint64_t *val)
> > +{
>     ASSERT(spin_is_locked(info->ref_lock));
> > +    return -ENOENT;
> > +}
> > +
Thank you!

> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint64_t val, enum cbm_type type)
> > +{

[...]

> > +
> > +    /*
> > +     * Step 5:
> > +     * Update ref according to COS ID.
> > +     */
> > +    ref[cos]++;
> > +    ASSERT(ref[cos] || cos == 0);
> > +    ref[old_cos]--;
> 
> If cos == 0 you can overflow ref[0].
> 
COS[0] stores the default value only. The value in it cannot be changed. So we
do not check its ref.

> > +    spin_unlock(&info->ref_lock);
> > +
> > +    /*
> > +     * Step 6:
> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> > +     * which COS the domain is using on the socket. One domain can only use
> > +     * one COS ID at same time on each socket.
> > +     */
> > +    d->arch.psr_cos_ids[socket] = cos;
> > +    xfree(val_array);
> > +
> > +    return 0;
> > +}
> > +
> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
> >  static void psr_free_cos(struct domain *d)
> >  {
> > -    if( !d->arch.psr_cos_ids )
> > +    unsigned int socket, cos;
> 
> An ASSERT that the domain is locked would be better than a comment.
> 
psr_free_cos is called by psr_domain_free which is finally called by
put_domain().

    #define put_domain(_d) \
        if ( atomic_dec_and_test(&(_d)->refcnt) ) domain_destroy(_d)

So, it is protected by refcnt.

Do you think it is necessary to check refcnt here? If codes outside have
something error to breake this protection, we cannot assure if below codes
go wrong even by ASSERT check here.

> > +    if ( !d->arch.psr_cos_ids )
> >          return;
> >  
> > +    /* Domain is free so its cos_ref should be decreased. */
> > +    for ( socket = 0; socket < nr_sockets; socket++ )
> > +    {
> > +        struct psr_socket_info *info;
> > +
> > +        /* cos 0 is default one which does not need be handled. */
> > +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> > +            continue;
> > +
> > +        /*
> > +         * If domain uses other cos ids, all corresponding refs must have been
> > +         * increased 1 for this domain. So, we need decrease them.
> > +         */
> > +        info = socket_info + socket;
> > +        ASSERT(info->cos_ref[cos] || cos == 0);
> > +        spin_lock(&info->ref_lock);
> > +        info->cos_ref[cos]--;
> 
> If cos == 0, it is possible to reach this with info->cos_ref[cos] == 0, in
> which case you are overflowing it?
> 
As above explanation, we do not care the ref count of COS[0].

> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
Jan Beulich March 8, 2017, 4:07 p.m. UTC | #6
>>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
> As set value flow is the most complicated one in psr, it will be
> divided to some patches to make things clearer. This patch
> implements the set value framework to show a whole picture firstly.
> 
> It also changes domctl interface to make it more general.
> 
> To make the set value flow be general and can support multiple features
> at same time, it includes below steps:
> 1. Get COS ID of current domain using.

What is the "using" here supposed to mean?

> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -546,18 +546,214 @@ int psr_get_val(struct domain *d, unsigned int socket,
>      return psr_get(socket, type, NULL, 0, d, val);
>  }
>  
> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t cbm, enum cbm_type type)
> +/* Set value functions */
> +static unsigned int get_cos_num(const struct psr_socket_info *info)
>  {
>      return 0;
>  }
>  
> +static int assemble_val_array(uint64_t *val,
> +                              uint32_t array_len,
> +                              const struct psr_socket_info *info,
> +                              unsigned int old_cos)
> +{
> +    return -EINVAL;
> +}
> +
> +static int set_new_val_to_array(uint64_t *val,

insert_new_val() ? And when talking about arrays, as indicated
before, please use [] notation instead of pointers. This is
particularly relevant when the function name suggests that it
would be "val" which gets inserted, but aiui it is really ...

> +                                uint32_t array_len,
> +                                const struct psr_socket_info *info,
> +                                enum psr_feat_type feat_type,
> +                                enum cbm_type type,
> +                                uint64_t m)

... "m". Therefore please also consider better naming of parameters.

> +static int write_psr_msr(unsigned int socket, unsigned int cos,
> +                         const uint64_t *val)
> +{
> +    return -ENOENT;
> +}

Is this function intended you write just one MSR, or potentially many?
In the latter case the name would perhaps better be "write_psr_msrs()".

> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint64_t val, enum cbm_type type)
> +{
> +    unsigned int old_cos;
> +    int cos, ret;
> +    unsigned int *ref;
> +    uint64_t *val_array;
> +    struct psr_socket_info *info = get_socket_info(socket);
> +    uint32_t array_len;
> +    enum psr_feat_type feat_type;
> +
> +    if ( IS_ERR(info) )
> +        return PTR_ERR(info);
> +
> +    feat_type = psr_cbm_type_to_feat_type(type);
> +    if ( !test_bit(feat_type, &info->feat_mask) )
> +        return -ENOENT;
> +
> +    /*
> +     * Step 0:
> +     * old_cos means the COS ID current domain is using. By default, it is 0.
> +     *
> +     * For every COS ID, there is a reference count to record how many domains
> +     * are using the COS register corresponding to this COS ID.
> +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> +     * - If ref[old_cos] is 1, that means this COS is only used by current
> +     *   domain.
> +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> +     *   this COS.
> +     */
> +    old_cos = d->arch.psr_cos_ids[socket];
> +    if ( old_cos > MAX_COS_REG_CNT )
> +        return -EOVERFLOW;

Doesn't this need to be >= ? And isn't this happening an indication
of a bug, i.e. shouldn't there be an ASSERT_UNREACHABLE() ahead
of the return?

> +    ref = info->cos_ref;
> +
> +    /*
> +     * Step 1:
> +     * Assemle a value array to store all featues cos_reg_val[old_cos].

Assemble ... features ...

> +     * And, set the input val into array according to the feature's
> +     * position in array.
> +     */
> +    array_len = get_cos_num(info);
> +    val_array = xzalloc_array(uint64_t, array_len);
> +    if ( !val_array )
> +        return -ENOMEM;
> +
> +    if ( (ret = assemble_val_array(val_array, array_len, info, old_cos)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    if ( (ret = set_new_val_to_array(val_array, array_len, info,
> +                                     feat_type, type, val)) != 0 )
> +    {
> +        xfree(val_array);
> +        return ret;
> +    }
> +
> +    /*
> +     * Lock here to make sure the ref is not changed during find and
> +     * write process.
> +     */
> +    spin_lock(&info->ref_lock);

Once again I don't think the comment is very useful - what you say
is the ordinary purpose of acquiring a lock.

> +    /*
> +     * Step 2:
> +     * Try to find if there is already a COS ID on which all features' values
> +     * are same as the array. Then, we can reuse this COS ID.
> +     */
> +    cos = find_cos(val_array, array_len, feat_type, info);
> +    if ( cos >= 0 )
> +    {
> +        if ( cos == old_cos )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return 0;
> +        }

You could save a level of indentation if you inverted the outer if()'s
condition and made the code above "else if".

> +    }
> +    else
> +    {
> +        /*
> +         * Step 3:
> +         * If fail to find, we need allocate a new COS ID.
> +         * If multiple domains are using same COS ID, its ref is more
> +         * than 1. That means we cannot free this COS to make current domain
> +         * use it. Because other domains are using the value saved in the COS.
> +         * Unless the ref is changed to 1 (mean only current domain is using
> +         * it), we cannot allocate the COS ID to current domain.

I think I had been confused by this already before, and I continue to
be: How could ref be "changed to 1" here, and then have said
meaning? If you refer to the value after a possible decrement, the
value then being 1 means there is another domain using it. Hence ...

> +         * So, only the COS ID which ref is 1 or 0 can be allocated.

... I think this is not generally correct either: A COS with ref 1 can
only be re-used it that's old_cos. In all other cases only ref 0 ones
are candidates. But anyway I think the comment belongs into the
function, which would then allow for seeing it be added along with
the actual code, making it possible to check that both match up.

> +         */
> +        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
> +        if ( cos < 0 )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return cos;
> +        }
> +
> +        /*
> +         * Step 4:
> +         * Write all features MSRs according to the COS ID.
> +         */
> +        ret = write_psr_msr(socket, cos, val_array);
> +        if ( ret )
> +        {
> +            spin_unlock(&info->ref_lock);
> +            xfree(val_array);
> +            return ret;
> +        }

These recurring error paths could certainly do with folding.

> +    }
> +
> +    /*
> +     * Step 5:
> +     * Update ref according to COS ID.
> +     */
> +    ref[cos]++;
> +    ASSERT(ref[cos] || cos == 0);

    ASSERT(!cos || ref[cos]);
    ASSERT(!old_cos || ref[old_cos]);

> +    ref[old_cos]--;
> +    spin_unlock(&info->ref_lock);
> +
> +    /*
> +     * Step 6:
> +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> +     * which COS the domain is using on the socket. One domain can only use
> +     * one COS ID at same time on each socket.
> +     */
> +    d->arch.psr_cos_ids[socket] = cos;

So the domain has not been paused, i.e. some of its vCPU-s may
be running on other pCPU-s (including ones on the socket in
question). How come it is safe to update this value here?

>  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>  static void psr_free_cos(struct domain *d)
>  {
> -    if( !d->arch.psr_cos_ids )
> +    unsigned int socket, cos;
> +
> +    if ( !d->arch.psr_cos_ids )
>          return;

As in an earlier patch I've asked for this check to be removed, I
think you will need to add a check on socket_info to be non-
NULL somewhere in this function.

> +    /* Domain is free so its cos_ref should be decreased. */

"Domain is free" ? DYM "is being destroyed"?

> +    for ( socket = 0; socket < nr_sockets; socket++ )
> +    {
> +        struct psr_socket_info *info;
> +
> +        /* cos 0 is default one which does not need be handled. */
> +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> +            continue;
> +
> +        /*
> +         * If domain uses other cos ids, all corresponding refs must have been
> +         * increased 1 for this domain. So, we need decrease them.
> +         */
> +        info = socket_info + socket;
> +        ASSERT(info->cos_ref[cos] || cos == 0);
> +        spin_lock(&info->ref_lock);
> +        info->cos_ref[cos]--;
> +        spin_unlock(&info->ref_lock);

The ASSERT() is useful only inside the locked region.

Jan
Yi Sun March 10, 2017, 2:54 a.m. UTC | #7
On 17-03-08 09:07:10, Jan Beulich wrote:
> >>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
> > As set value flow is the most complicated one in psr, it will be
> > divided to some patches to make things clearer. This patch
> > implements the set value framework to show a whole picture firstly.
> > 
> > It also changes domctl interface to make it more general.
> > 
> > To make the set value flow be general and can support multiple features
> > at same time, it includes below steps:
> > 1. Get COS ID of current domain using.
> 
> What is the "using" here supposed to mean?
> 
My meaning is to get the cos id that current domain is using. Sorry for this.
Will make it better.

> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -546,18 +546,214 @@ int psr_get_val(struct domain *d, unsigned int socket,
> >      return psr_get(socket, type, NULL, 0, d, val);
> >  }
> >  
> > -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t cbm, enum cbm_type type)
> > +/* Set value functions */
> > +static unsigned int get_cos_num(const struct psr_socket_info *info)
> >  {
> >      return 0;
> >  }
> >  
> > +static int assemble_val_array(uint64_t *val,
> > +                              uint32_t array_len,
> > +                              const struct psr_socket_info *info,
> > +                              unsigned int old_cos)
> > +{
> > +    return -EINVAL;
> > +}
> > +
> > +static int set_new_val_to_array(uint64_t *val,
> 
> insert_new_val() ? And when talking about arrays, as indicated
> before, please use [] notation instead of pointers. This is
> particularly relevant when the function name suggests that it
> would be "val" which gets inserted, but aiui it is really ...
> 
> > +                                uint32_t array_len,
> > +                                const struct psr_socket_info *info,
> > +                                enum psr_feat_type feat_type,
> > +                                enum cbm_type type,
> > +                                uint64_t m)
> 
> ... "m". Therefore please also consider better naming of parameters.
> 
Sure, thanks!

> > +static int write_psr_msr(unsigned int socket, unsigned int cos,
> > +                         const uint64_t *val)
> > +{
> > +    return -ENOENT;
> > +}
> 
> Is this function intended you write just one MSR, or potentially many?
> In the latter case the name would perhaps better be "write_psr_msrs()".
> 
For one feature, it does set one MSR.

> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint64_t val, enum cbm_type type)
> > +{
> > +    unsigned int old_cos;
> > +    int cos, ret;
> > +    unsigned int *ref;
> > +    uint64_t *val_array;
> > +    struct psr_socket_info *info = get_socket_info(socket);
> > +    uint32_t array_len;
> > +    enum psr_feat_type feat_type;
> > +
> > +    if ( IS_ERR(info) )
> > +        return PTR_ERR(info);
> > +
> > +    feat_type = psr_cbm_type_to_feat_type(type);
> > +    if ( !test_bit(feat_type, &info->feat_mask) )
> > +        return -ENOENT;
> > +
> > +    /*
> > +     * Step 0:
> > +     * old_cos means the COS ID current domain is using. By default, it is 0.
> > +     *
> > +     * For every COS ID, there is a reference count to record how many domains
> > +     * are using the COS register corresponding to this COS ID.
> > +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> > +     * - If ref[old_cos] is 1, that means this COS is only used by current
> > +     *   domain.
> > +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> > +     *   this COS.
> > +     */
> > +    old_cos = d->arch.psr_cos_ids[socket];
> > +    if ( old_cos > MAX_COS_REG_CNT )
> > +        return -EOVERFLOW;
> 
> Doesn't this need to be >= ? And isn't this happening an indication
> of a bug, i.e. shouldn't there be an ASSERT_UNREACHABLE() ahead
> of the return?
> 
Sorry. This has been corrected in next version. Thanks!

> > +    ref = info->cos_ref;
> > +
> > +    /*
> > +     * Step 1:
> > +     * Assemle a value array to store all featues cos_reg_val[old_cos].
> 
> Assemble ... features ...
> 
Oh, sorry.

[...]
> > +    /*
> > +     * Step 2:
> > +     * Try to find if there is already a COS ID on which all features' values
> > +     * are same as the array. Then, we can reuse this COS ID.
> > +     */
> > +    cos = find_cos(val_array, array_len, feat_type, info);
> > +    if ( cos >= 0 )
> > +    {
> > +        if ( cos == old_cos )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return 0;
> > +        }
> 
> You could save a level of indentation if you inverted the outer if()'s
> condition and made the code above "else if".
> 
Will consider it.

> > +    }
> > +    else
> > +    {
> > +        /*
> > +         * Step 3:
> > +         * If fail to find, we need allocate a new COS ID.
> > +         * If multiple domains are using same COS ID, its ref is more
> > +         * than 1. That means we cannot free this COS to make current domain
> > +         * use it. Because other domains are using the value saved in the COS.
> > +         * Unless the ref is changed to 1 (mean only current domain is using
> > +         * it), we cannot allocate the COS ID to current domain.
> 
> I think I had been confused by this already before, and I continue to
> be: How could ref be "changed to 1" here, and then have said
> meaning? If you refer to the value after a possible decrement, the
> value then being 1 means there is another domain using it. Hence ...
> 
> > +         * So, only the COS ID which ref is 1 or 0 can be allocated.
> 
> ... I think this is not generally correct either: A COS with ref 1 can
> only be re-used it that's old_cos. In all other cases only ref 0 ones
> are candidates. But anyway I think the comment belongs into the
> function, which would then allow for seeing it be added along with
> the actual code, making it possible to check that both match up.
> 
Sorry, the expression is not good. In fact, only COS ID which ref is 1 or 0
can be allocated to current domain. If old_cos is not 0 and its ref==1 means
that only current domain is using this old_cos ID. So, this old_cos ID is
certainly can be reused by current domain. Ref==0 means there is no any domain
using this COS ID. So it can be used too.

I will polish the comments.

> > +         */
> > +        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
> > +        if ( cos < 0 )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return cos;
> > +        }
> > +
> > +        /*
> > +         * Step 4:
> > +         * Write all features MSRs according to the COS ID.
> > +         */
> > +        ret = write_psr_msr(socket, cos, val_array);
> > +        if ( ret )
> > +        {
> > +            spin_unlock(&info->ref_lock);
> > +            xfree(val_array);
> > +            return ret;
> > +        }
> 
> These recurring error paths could certainly do with folding.
> 
Yes, Wei has suggested to use goto to handle them. This has been refined in
next version.

> > +    }
> > +
> > +    /*
> > +     * Step 5:
> > +     * Update ref according to COS ID.
> > +     */
> > +    ref[cos]++;
> > +    ASSERT(ref[cos] || cos == 0);
> 
>     ASSERT(!cos || ref[cos]);
>     ASSERT(!old_cos || ref[old_cos]);
> 
Ok, thanks!

> > +    ref[old_cos]--;
> > +    spin_unlock(&info->ref_lock);
> > +
> > +    /*
> > +     * Step 6:
> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> > +     * which COS the domain is using on the socket. One domain can only use
> > +     * one COS ID at same time on each socket.
> > +     */
> > +    d->arch.psr_cos_ids[socket] = cos;
> 
> So the domain has not been paused, i.e. some of its vCPU-s may
> be running on other pCPU-s (including ones on the socket in
> question). How come it is safe to update this value here?
> 
This is a domctl operation. It is protected by domctl_lock which is locked in
do_domctl().

> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
> >  static void psr_free_cos(struct domain *d)
> >  {
> > -    if( !d->arch.psr_cos_ids )
> > +    unsigned int socket, cos;
> > +
> > +    if ( !d->arch.psr_cos_ids )
> >          return;
> 
> As in an earlier patch I've asked for this check to be removed, I
> think you will need to add a check on socket_info to be non-
> NULL somewhere in this function.
> 
Ok, will do it in the loop.

> > +    /* Domain is free so its cos_ref should be decreased. */
> 
> "Domain is free" ? DYM "is being destroyed"?
> 
Yes. 

> > +    for ( socket = 0; socket < nr_sockets; socket++ )
> > +    {
> > +        struct psr_socket_info *info;
> > +
> > +        /* cos 0 is default one which does not need be handled. */
> > +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> > +            continue;
> > +
> > +        /*
> > +         * If domain uses other cos ids, all corresponding refs must have been
> > +         * increased 1 for this domain. So, we need decrease them.
> > +         */
> > +        info = socket_info + socket;
> > +        ASSERT(info->cos_ref[cos] || cos == 0);
> > +        spin_lock(&info->ref_lock);
> > +        info->cos_ref[cos]--;
> > +        spin_unlock(&info->ref_lock);
> 
> The ASSERT() is useful only inside the locked region.
> 
Ok, thanks!

> Jan
Yi Sun March 10, 2017, 7:46 a.m. UTC | #8
On 17-03-08 09:07:10, Jan Beulich wrote:
[...]
> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
> >  static void psr_free_cos(struct domain *d)
> >  {
> > -    if( !d->arch.psr_cos_ids )
> > +    unsigned int socket, cos;
> > +
> > +    if ( !d->arch.psr_cos_ids )
> >          return;
> 
> As in an earlier patch I've asked for this check to be removed, I
> think you will need to add a check on socket_info to be non-
> NULL somewhere in this function.
> 
d->arch.psr_cos_ids is used in the loop below. Shall I not check it?
 
> > +    for ( socket = 0; socket < nr_sockets; socket++ )
> > +    {
> > +        struct psr_socket_info *info;
> > +
> > +        /* cos 0 is default one which does not need be handled. */
> > +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
Here.

> > +            continue;
> > +

BRs,
Sun Yi
Jan Beulich March 10, 2017, 9:09 a.m. UTC | #9
>>> On 10.03.17 at 03:54, <yi.y.sun@linux.intel.com> wrote:
> On 17-03-08 09:07:10, Jan Beulich wrote:
>> >>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
>> > +static int write_psr_msr(unsigned int socket, unsigned int cos,
>> > +                         const uint64_t *val)
>> > +{
>> > +    return -ENOENT;
>> > +}
>> 
>> Is this function intended you write just one MSR, or potentially many?
>> In the latter case the name would perhaps better be "write_psr_msrs()".
>> 
> For one feature, it does set one MSR.

In which case - why is the value being passed by pointer?

>> > +    ref[old_cos]--;
>> > +    spin_unlock(&info->ref_lock);
>> > +
>> > +    /*
>> > +     * Step 6:
>> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
>> > +     * which COS the domain is using on the socket. One domain can only use
>> > +     * one COS ID at same time on each socket.
>> > +     */
>> > +    d->arch.psr_cos_ids[socket] = cos;
>> 
>> So the domain has not been paused, i.e. some of its vCPU-s may
>> be running on other pCPU-s (including ones on the socket in
>> question). How come it is safe to update this value here?
>> 
> This is a domctl operation. It is protected by domctl_lock which is locked in
> do_domctl().

But that lock doesn't keep the subject domain's vCPU-s from
running on other pCPU-s at the same time.

>> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>> >  static void psr_free_cos(struct domain *d)
>> >  {
>> > -    if( !d->arch.psr_cos_ids )
>> > +    unsigned int socket, cos;
>> > +
>> > +    if ( !d->arch.psr_cos_ids )
>> >          return;
>> 
>> As in an earlier patch I've asked for this check to be removed, I
>> think you will need to add a check on socket_info to be non-
>> NULL somewhere in this function.
>> 
> Ok, will do it in the loop.

Please don't - loop invariants should be put outside of loops.

Jan
Jan Beulich March 10, 2017, 9:10 a.m. UTC | #10
>>> On 10.03.17 at 08:46, <yi.y.sun@linux.intel.com> wrote:
> On 17-03-08 09:07:10, Jan Beulich wrote:
> [...]
>> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>> >  static void psr_free_cos(struct domain *d)
>> >  {
>> > -    if( !d->arch.psr_cos_ids )
>> > +    unsigned int socket, cos;
>> > +
>> > +    if ( !d->arch.psr_cos_ids )
>> >          return;
>> 
>> As in an earlier patch I've asked for this check to be removed, I
>> think you will need to add a check on socket_info to be non-
>> NULL somewhere in this function.
>> 
> d->arch.psr_cos_ids is used in the loop below. Shall I not check it?
>  
>> > +    for ( socket = 0; socket < nr_sockets; socket++ )
>> > +    {
>> > +        struct psr_socket_info *info;
>> > +
>> > +        /* cos 0 is default one which does not need be handled. */
>> > +        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
> Here.

Oh, yes, you should. But the check should be added here instead
of in the earlier patch.

Jan
Yi Sun March 13, 2017, 2:36 a.m. UTC | #11
On 17-03-10 02:09:55, Jan Beulich wrote:
> >>> On 10.03.17 at 03:54, <yi.y.sun@linux.intel.com> wrote:
> > On 17-03-08 09:07:10, Jan Beulich wrote:
> >> >>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
> >> > +static int write_psr_msr(unsigned int socket, unsigned int cos,
> >> > +                         const uint64_t *val)
> >> > +{
> >> > +    return -ENOENT;
> >> > +}
> >> 
> >> Is this function intended you write just one MSR, or potentially many?
> >> In the latter case the name would perhaps better be "write_psr_msrs()".
> >> 
> > For one feature, it does set one MSR.
> 
> In which case - why is the value being passed by pointer?
>  
Will change it. My original meaning is, for example, when setting L3 CAT, only
one MSR is set.

> >> > +    ref[old_cos]--;
> >> > +    spin_unlock(&info->ref_lock);
> >> > +
> >> > +    /*
> >> > +     * Step 6:
> >> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> >> > +     * which COS the domain is using on the socket. One domain can only use
> >> > +     * one COS ID at same time on each socket.
> >> > +     */
> >> > +    d->arch.psr_cos_ids[socket] = cos;
> >> 
> >> So the domain has not been paused, i.e. some of its vCPU-s may
> >> be running on other pCPU-s (including ones on the socket in
> >> question). How come it is safe to update this value here?
> >> 
> > This is a domctl operation. It is protected by domctl_lock which is locked in
> > do_domctl().
> 
> But that lock doesn't keep the subject domain's vCPU-s from
> running on other pCPU-s at the same time.
> 
Yes, you are right. But only 'psr_ctxt_switch_to()' can access
psr_cos_ids[socket] when psr_cos_ids[socket] is being set. 'psr_ctxt_switch_to()'
read the cos and set it to ASSOC register. Context switch is short so that the
correct cos can be set to ASSOC register in a short time.

BRs,
Sun Yi
Jan Beulich March 13, 2017, 12:35 p.m. UTC | #12
>>> On 13.03.17 at 03:36, <yi.y.sun@linux.intel.com> wrote:
> On 17-03-10 02:09:55, Jan Beulich wrote:
>> >>> On 10.03.17 at 03:54, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-03-08 09:07:10, Jan Beulich wrote:
>> >> >>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
>> >> > +    ref[old_cos]--;
>> >> > +    spin_unlock(&info->ref_lock);
>> >> > +
>> >> > +    /*
>> >> > +     * Step 6:
>> >> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
>> >> > +     * which COS the domain is using on the socket. One domain can only use
>> >> > +     * one COS ID at same time on each socket.
>> >> > +     */
>> >> > +    d->arch.psr_cos_ids[socket] = cos;
>> >> 
>> >> So the domain has not been paused, i.e. some of its vCPU-s may
>> >> be running on other pCPU-s (including ones on the socket in
>> >> question). How come it is safe to update this value here?
>> >> 
>> > This is a domctl operation. It is protected by domctl_lock which is locked in
>> > do_domctl().
>> 
>> But that lock doesn't keep the subject domain's vCPU-s from
>> running on other pCPU-s at the same time.
>> 
> Yes, you are right. But only 'psr_ctxt_switch_to()' can access
> psr_cos_ids[socket] when psr_cos_ids[socket] is being set. 'psr_ctxt_switch_to()'
> read the cos and set it to ASSOC register. Context switch is short so that the
> correct cos can be set to ASSOC register in a short time.

That's a reply you should never give: No matter how short a time
window, eventually it'll be hit. A very good example of this is the
VMCS race we've recently fixed (commit 2f4d2198a9), and which
had been there for years until it was first observed (and after
that it took another year or so until we've actually managed to
figure out what's going on).

Jan
Yi Sun March 14, 2017, 2:43 a.m. UTC | #13
On 17-03-13 06:35:33, Jan Beulich wrote:
> >>> On 13.03.17 at 03:36, <yi.y.sun@linux.intel.com> wrote:
> > On 17-03-10 02:09:55, Jan Beulich wrote:
> >> >>> On 10.03.17 at 03:54, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-03-08 09:07:10, Jan Beulich wrote:
> >> >> >>> On 15.02.17 at 09:49, <yi.y.sun@linux.intel.com> wrote:
> >> >> > +    ref[old_cos]--;
> >> >> > +    spin_unlock(&info->ref_lock);
> >> >> > +
> >> >> > +    /*
> >> >> > +     * Step 6:
> >> >> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> >> >> > +     * which COS the domain is using on the socket. One domain can only use
> >> >> > +     * one COS ID at same time on each socket.
> >> >> > +     */
> >> >> > +    d->arch.psr_cos_ids[socket] = cos;
> >> >> 
> >> >> So the domain has not been paused, i.e. some of its vCPU-s may
> >> >> be running on other pCPU-s (including ones on the socket in
> >> >> question). How come it is safe to update this value here?
> >> >> 
> >> > This is a domctl operation. It is protected by domctl_lock which is locked in
> >> > do_domctl().
> >> 
> >> But that lock doesn't keep the subject domain's vCPU-s from
> >> running on other pCPU-s at the same time.
> >> 
> > Yes, you are right. But only 'psr_ctxt_switch_to()' can access
> > psr_cos_ids[socket] when psr_cos_ids[socket] is being set. 'psr_ctxt_switch_to()'
> > read the cos and set it to ASSOC register. Context switch is short so that the
> > correct cos can be set to ASSOC register in a short time.
> 
> That's a reply you should never give: No matter how short a time
> window, eventually it'll be hit. A very good example of this is the
> VMCS race we've recently fixed (commit 2f4d2198a9), and which
> had been there for years until it was first observed (and after
> that it took another year or so until we've actually managed to
> figure out what's going on).
> 
Sorry for that. Let me explain in details.

There are three scenarios. E.g.
1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
   psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
   called which makes COS ID 1 work. For this case, we do not any action.

2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl
   interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
   time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.
   For this case, we don't need any action either.

3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
   is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
   The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
   So even the COS ID got here is wrong, it is still a workable ID (within
   cos_max). The functionality is still workable but of course the COS ID may
   not be the one that user intends to use.

If you think scenario 3 is not acceptable, I suggest to add read write lock as
below. How do you think? Thanks!

static void psr_assoc_cos()
{
    read_lock(&rwlock);
    *reg = (*reg & ~cos_mask) |
            (((uint64_t)cos << PSR_ASSOC_REG_SHIFT) & cos_mask);
    read_unlock(&rwlock);
}

int psr_set_val()
{
    ......
    write_lock(&rwlock);
    d->arch.psr_cos_ids[socket] = cos;
    write_unlock(&rwlock);
    ......
}

> Jan
Jan Beulich March 14, 2017, 6:29 a.m. UTC | #14
>>> Yi Sun <yi.y.sun@linux.intel.com> 03/14/17 3:42 AM >>>
>There are three scenarios. E.g.
>1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
>psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
>called which makes COS ID 1 work. For this case, we do not any action.
>
>2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl
>interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
>time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.
>For this case, we don't need any action either.

And that's because of? I'd think the domctl caller can expect the new COS ID
to take effect immediately for all vCPU-s of the target domain.

>3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
>is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
>The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
>So even the COS ID got here is wrong, it is still a workable ID (within
>cos_max). The functionality is still workable but of course the COS ID may
>not be the one that user intends to use.
>
>If you think scenario 3 is not acceptable, I suggest to add read write lock as
>below. How do you think? Thanks!
>
>static void psr_assoc_cos()
>{
>read_lock(&rwlock);
>*reg = (*reg & ~cos_mask) |
>(((uint64_t)cos << PSR_ASSOC_REG_SHIFT) & cos_mask);
>read_unlock(&rwlock);
>}
>
>int psr_set_val()
>{
>......
>write_lock(&rwlock);
>d->arch.psr_cos_ids[socket] = cos;
>write_unlock(&rwlock);
>......
>}

I don't see how that would help. The domain could then still use a stale COS
ID. I see only two valid approaches: Either you pause the domain during the
update, or you IPI CPUs running vCPU-s of that domain to reload their MSRs.
The latter could become tricky afaict ...

Jan
Yi Sun March 14, 2017, 9:21 a.m. UTC | #15
On 17-03-14 00:29:09, Jan Beulich wrote:
> >>> Yi Sun <yi.y.sun@linux.intel.com> 03/14/17 3:42 AM >>>
> >There are three scenarios. E.g.
> >1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
> >psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
> >called which makes COS ID 1 work. For this case, we do not any action.
> >
> >2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl
> >interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
> >time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.
> >For this case, we don't need any action either.
> 
> And that's because of? I'd think the domctl caller can expect the new COS ID
> to take effect immediately for all vCPU-s of the target domain.
> 
> >3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
> >is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
> >The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
> >So even the COS ID got here is wrong, it is still a workable ID (within
> >cos_max). The functionality is still workable but of course the COS ID may
> >not be the one that user intends to use.
> >
> >If you think scenario 3 is not acceptable, I suggest to add read write lock as
> >below. How do you think? Thanks!
> >
> >static void psr_assoc_cos()
> >{
> >read_lock(&rwlock);
> >*reg = (*reg & ~cos_mask) |
> >(((uint64_t)cos << PSR_ASSOC_REG_SHIFT) & cos_mask);
> >read_unlock(&rwlock);
> >}
> >
> >int psr_set_val()
> >{
> >......
> >write_lock(&rwlock);
> >d->arch.psr_cos_ids[socket] = cos;
> >write_unlock(&rwlock);
> >......
> >}
> 
> I don't see how that would help. The domain could then still use a stale COS
> ID. I see only two valid approaches: Either you pause the domain during the
> update, or you IPI CPUs running vCPU-s of that domain to reload their MSRs.
> The latter could become tricky afaict ...
> 
For IPI solution, could you please help to check if below codes can work?
Thanks!

struct assoc_write_info
{
    struct domain *d;
};

static void do_write_assoc_msr(void *data)
{
......
    wrmsrl(MSR_IA32_PSR_ASSOC, reg);
......
}

static void write_psr_assoc_msr(struct domain *d)
{
    struct assoc_write_info data = { .d = d };

    cpumask_t *online = cpupool_domain_cpumask(d);

    /* Make all valid cpus execute do_write_assoc_msr. */
    on_selected_cpus(online, do_write_assoc_msr, &data, 0);
}

int psr_set_val(...)
{
......
    d->arch.psr_cos_ids[socket] = cos;
    write_psr_assoc_msr(d);
......
}

BRs,
Sun Yi
Jan Beulich March 14, 2017, 10:24 a.m. UTC | #16
>>> On 14.03.17 at 10:21, <yi.y.sun@linux.intel.com> wrote:
> On 17-03-14 00:29:09, Jan Beulich wrote:
>> >>> Yi Sun <yi.y.sun@linux.intel.com> 03/14/17 3:42 AM >>>
>> >There are three scenarios. E.g.
>> >1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
>> >psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
>> >called which makes COS ID 1 work. For this case, we do not any action.
>> >
>> >2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls 
> domctl
>> >interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
>> >time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.
>> >For this case, we don't need any action either.
>> 
>> And that's because of? I'd think the domctl caller can expect the new COS ID
>> to take effect immediately for all vCPU-s of the target domain.
>> 
>> >3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
>> >is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
>> >The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
>> >So even the COS ID got here is wrong, it is still a workable ID (within
>> >cos_max). The functionality is still workable but of course the COS ID may
>> >not be the one that user intends to use.
>> >
>> >If you think scenario 3 is not acceptable, I suggest to add read write lock 
> as
>> >below. How do you think? Thanks!
>> >
>> >static void psr_assoc_cos()
>> >{
>> >read_lock(&rwlock);
>> >*reg = (*reg & ~cos_mask) |
>> >(((uint64_t)cos << PSR_ASSOC_REG_SHIFT) & cos_mask);
>> >read_unlock(&rwlock);
>> >}
>> >
>> >int psr_set_val()
>> >{
>> >......
>> >write_lock(&rwlock);
>> >d->arch.psr_cos_ids[socket] = cos;
>> >write_unlock(&rwlock);
>> >......
>> >}
>> 
>> I don't see how that would help. The domain could then still use a stale COS
>> ID. I see only two valid approaches: Either you pause the domain during the
>> update, or you IPI CPUs running vCPU-s of that domain to reload their MSRs.
>> The latter could become tricky afaict ...
>> 
> For IPI solution, could you please help to check if below codes can work?

Well, I did say "tricky", didn't I? What you suggest still leaves a
time window, as the result of cpupool_domain_cpumask() may
change behind your back. The simplest (and least tricky) solution
would be to IPI all CPUs, and check whether any adjustment is
needed inside the handler. However, you'd then need to make
sure there's no context switch related race, similar to the VMCS
one I've referred you to as example.

Jan
Yi Sun March 15, 2017, 2:52 a.m. UTC | #17
On 17-03-14 04:24:51, Jan Beulich wrote:
> >>> On 14.03.17 at 10:21, <yi.y.sun@linux.intel.com> wrote:
> > On 17-03-14 00:29:09, Jan Beulich wrote:
> >> >>> Yi Sun <yi.y.sun@linux.intel.com> 03/14/17 3:42 AM >>>
> >> >There are three scenarios. E.g.
> >> >1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
> >> >psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
> >> >called which makes COS ID 1 work. For this case, we do not any action.
> >> >
> >> >2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls 
> > domctl
> >> >interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
> >> >time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.
> >> >For this case, we don't need any action either.
> >> 
> >> And that's because of? I'd think the domctl caller can expect the new COS ID
> >> to take effect immediately for all vCPU-s of the target domain.
> >> 
> >> >3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
> >> >is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
> >> >The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
> >> >So even the COS ID got here is wrong, it is still a workable ID (within
> >> >cos_max). The functionality is still workable but of course the COS ID may
> >> >not be the one that user intends to use.
> >> >
> >> >If you think scenario 3 is not acceptable, I suggest to add read write lock 
> > as
> >> >below. How do you think? Thanks!
> >> >
> >> >static void psr_assoc_cos()
> >> >{
> >> >read_lock(&rwlock);
> >> >*reg = (*reg & ~cos_mask) |
> >> >(((uint64_t)cos << PSR_ASSOC_REG_SHIFT) & cos_mask);
> >> >read_unlock(&rwlock);
> >> >}
> >> >
> >> >int psr_set_val()
> >> >{
> >> >......
> >> >write_lock(&rwlock);
> >> >d->arch.psr_cos_ids[socket] = cos;
> >> >write_unlock(&rwlock);
> >> >......
> >> >}
> >> 
> >> I don't see how that would help. The domain could then still use a stale COS
> >> ID. I see only two valid approaches: Either you pause the domain during the
> >> update, or you IPI CPUs running vCPU-s of that domain to reload their MSRs.
> >> The latter could become tricky afaict ...
> >> 
> > For IPI solution, could you please help to check if below codes can work?
> 
> Well, I did say "tricky", didn't I? What you suggest still leaves a
> time window, as the result of cpupool_domain_cpumask() may
> change behind your back. The simplest (and least tricky) solution
> would be to IPI all CPUs, and check whether any adjustment is
> needed inside the handler. However, you'd then need to make
> sure there's no context switch related race, similar to the VMCS
> one I've referred you to as example.
> 
Sorry, I may not fully understand your meaning. My thoughts are below.
1. We set 'd->arch.psr_cos_ids[socket] = cos;' in 'psr_set_val';

2. After that, we get valid cpumask through cpupool_domain_cpumask();

3. If the actual valid cpumask changed after that, the new cpu is valid so
   that the context switch happens. Then 'psr_ctxt_switch_to' is called to
   update the new cpu's ASSOC register with the new COS ID which has been
   set in step 1.

4. Send IPI to all cpus on cpumask got in step 2. They will update their
   ASSOC registers according to their domains psr_cos_ids[].

So I think this flow can cover all cpus which ASSOC registers need be updated.

What is your idea? Thanks!

BRs,
Sun Yi
Jan Beulich March 15, 2017, 7:40 a.m. UTC | #18
>>> On 15.03.17 at 03:52, <yi.y.sun@linux.intel.com> wrote:
> Sorry, I may not fully understand your meaning. My thoughts are below.
> 1. We set 'd->arch.psr_cos_ids[socket] = cos;' in 'psr_set_val';
> 
> 2. After that, we get valid cpumask through cpupool_domain_cpumask();
> 
> 3. If the actual valid cpumask changed after that, the new cpu is valid so
>    that the context switch happens. Then 'psr_ctxt_switch_to' is called to
>    update the new cpu's ASSOC register with the new COS ID which has been
>    set in step 1.
> 
> 4. Send IPI to all cpus on cpumask got in step 2. They will update their
>    ASSOC registers according to their domains psr_cos_ids[].
> 
> So I think this flow can cover all cpus which ASSOC registers need be 
> updated.

You writing it down this way makes me realize that the IPI approach
can't work this way at all: It leaves a time window (between 1 and 2)
where the domain may have vCPU-s running with the wrong COS ID.
I think pausing the vCPU is unavoidable, unless it can be explained
that running with the wrong COS ID for a brief period of time is not
really a problem. I do realize that the pausing approach has its own
difficulty wrt Dom0, but I think that's solvable (e.g. by doing the
actual work in a tasklet instead of in the context of a Dom0 vCPU).

Jan
Yi Sun March 15, 2017, 8:18 a.m. UTC | #19
On 17-03-15 01:40:06, Jan Beulich wrote:
> >>> On 15.03.17 at 03:52, <yi.y.sun@linux.intel.com> wrote:
> > Sorry, I may not fully understand your meaning. My thoughts are below.
> > 1. We set 'd->arch.psr_cos_ids[socket] = cos;' in 'psr_set_val';
> > 
> > 2. After that, we get valid cpumask through cpupool_domain_cpumask();
> > 
> > 3. If the actual valid cpumask changed after that, the new cpu is valid so
> >    that the context switch happens. Then 'psr_ctxt_switch_to' is called to
> >    update the new cpu's ASSOC register with the new COS ID which has been
> >    set in step 1.
> > 
> > 4. Send IPI to all cpus on cpumask got in step 2. They will update their
> >    ASSOC registers according to their domains psr_cos_ids[].
> > 
> > So I think this flow can cover all cpus which ASSOC registers need be 
> > updated.
> 
> You writing it down this way makes me realize that the IPI approach
> can't work this way at all: It leaves a time window (between 1 and 2)
> where the domain may have vCPU-s running with the wrong COS ID.
> I think pausing the vCPU is unavoidable, unless it can be explained
> that running with the wrong COS ID for a brief period of time is not
> really a problem. I do realize that the pausing approach has its own
> difficulty wrt Dom0, but I think that's solvable (e.g. by doing the
> actual work in a tasklet instead of in the context of a Dom0 vCPU).
> 
I have below thoughts.
1. We can use domain_pause for all domains except Dom0 to update COS ID.
2. PSR features are to set cache capacity for a domain. The setting to
   cache is progressively effective. When the cache setting becomes really
   effective, the time slice to schedule a domain may have passed. Moreover,
   even a wrong COS ID is used to set ASSOC, only another CBM be effective
   for a short time. In next Dom0 schedule, the correct CBM will take effect.
   So can we just leave Dom0 setting as current implementation?

Thanks,
Sun Yi
Jan Beulich March 15, 2017, 8:32 a.m. UTC | #20
>>> On 15.03.17 at 09:18, <yi.y.sun@linux.intel.com> wrote:
> On 17-03-15 01:40:06, Jan Beulich wrote:
>> >>> On 15.03.17 at 03:52, <yi.y.sun@linux.intel.com> wrote:
>> > Sorry, I may not fully understand your meaning. My thoughts are below.
>> > 1. We set 'd->arch.psr_cos_ids[socket] = cos;' in 'psr_set_val';
>> > 
>> > 2. After that, we get valid cpumask through cpupool_domain_cpumask();
>> > 
>> > 3. If the actual valid cpumask changed after that, the new cpu is valid so
>> >    that the context switch happens. Then 'psr_ctxt_switch_to' is called to
>> >    update the new cpu's ASSOC register with the new COS ID which has been
>> >    set in step 1.
>> > 
>> > 4. Send IPI to all cpus on cpumask got in step 2. They will update their
>> >    ASSOC registers according to their domains psr_cos_ids[].
>> > 
>> > So I think this flow can cover all cpus which ASSOC registers need be 
>> > updated.
>> 
>> You writing it down this way makes me realize that the IPI approach
>> can't work this way at all: It leaves a time window (between 1 and 2)
>> where the domain may have vCPU-s running with the wrong COS ID.
>> I think pausing the vCPU is unavoidable, unless it can be explained
>> that running with the wrong COS ID for a brief period of time is not
>> really a problem. I do realize that the pausing approach has its own
>> difficulty wrt Dom0, but I think that's solvable (e.g. by doing the
>> actual work in a tasklet instead of in the context of a Dom0 vCPU).
>> 
> I have below thoughts.
> 1. We can use domain_pause for all domains except Dom0 to update COS ID.
> 2. PSR features are to set cache capacity for a domain. The setting to
>    cache is progressively effective. When the cache setting becomes really
>    effective, the time slice to schedule a domain may have passed. Moreover,
>    even a wrong COS ID is used to set ASSOC, only another CBM be effective
>    for a short time. In next Dom0 schedule, the correct CBM will take effect.
>    So can we just leave Dom0 setting as current implementation?

If at all possible I'd like to avoid special casing Dom0. Please keep in
mind that in disaggregated environments domctl-s may be issued by
other than Dom0. So either the argumentation above to use the
current implementation is okay for all domains (and is reasonable to
expect to hold for possible future extensions), or all domains need
to be taken care of by a changed approach.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 098e399..f8f5539 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1422,21 +1422,21 @@  long arch_do_domctl(
         switch ( domctl->u.psr_cat_op.cmd )
         {
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_CODE);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3_CODE);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_DATA);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3_DATA);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index c1afd36..d414b5e 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -546,18 +546,214 @@  int psr_get_val(struct domain *d, unsigned int socket,
     return psr_get(socket, type, NULL, 0, d, val);
 }
 
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type)
+/* Set value functions */
+static unsigned int get_cos_num(const struct psr_socket_info *info)
 {
     return 0;
 }
 
+static int assemble_val_array(uint64_t *val,
+                              uint32_t array_len,
+                              const struct psr_socket_info *info,
+                              unsigned int old_cos)
+{
+    return -EINVAL;
+}
+
+static int set_new_val_to_array(uint64_t *val,
+                                uint32_t array_len,
+                                const struct psr_socket_info *info,
+                                enum psr_feat_type feat_type,
+                                enum cbm_type type,
+                                uint64_t m)
+{
+    return -EINVAL;
+}
+
+static int find_cos(const uint64_t *val, uint32_t array_len,
+                    enum psr_feat_type feat_type,
+                    const struct psr_socket_info *info)
+{
+    return -ENOENT;
+}
+
+static int pick_avail_cos(const struct psr_socket_info *info,
+                          const uint64_t *val, uint32_t array_len,
+                          unsigned int old_cos,
+                          enum psr_feat_type feat_type)
+{
+    return -ENOENT;
+}
+
+static int write_psr_msr(unsigned int socket, unsigned int cos,
+                         const uint64_t *val)
+{
+    return -ENOENT;
+}
+
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint64_t val, enum cbm_type type)
+{
+    unsigned int old_cos;
+    int cos, ret;
+    unsigned int *ref;
+    uint64_t *val_array;
+    struct psr_socket_info *info = get_socket_info(socket);
+    uint32_t array_len;
+    enum psr_feat_type feat_type;
+
+    if ( IS_ERR(info) )
+        return PTR_ERR(info);
+
+    feat_type = psr_cbm_type_to_feat_type(type);
+    if ( !test_bit(feat_type, &info->feat_mask) )
+        return -ENOENT;
+
+    /*
+     * Step 0:
+     * old_cos means the COS ID current domain is using. By default, it is 0.
+     *
+     * For every COS ID, there is a reference count to record how many domains
+     * are using the COS register corresponding to this COS ID.
+     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
+     * - If ref[old_cos] is 1, that means this COS is only used by current
+     *   domain.
+     * - If ref[old_cos] is more than 1, that mean multiple domains are using
+     *   this COS.
+     */
+    old_cos = d->arch.psr_cos_ids[socket];
+    if ( old_cos > MAX_COS_REG_CNT )
+        return -EOVERFLOW;
+
+    ref = info->cos_ref;
+
+    /*
+     * Step 1:
+     * Assemle a value array to store all featues cos_reg_val[old_cos].
+     * And, set the input val into array according to the feature's
+     * position in array.
+     */
+    array_len = get_cos_num(info);
+    val_array = xzalloc_array(uint64_t, array_len);
+    if ( !val_array )
+        return -ENOMEM;
+
+    if ( (ret = assemble_val_array(val_array, array_len, info, old_cos)) != 0 )
+    {
+        xfree(val_array);
+        return ret;
+    }
+
+    if ( (ret = set_new_val_to_array(val_array, array_len, info,
+                                     feat_type, type, val)) != 0 )
+    {
+        xfree(val_array);
+        return ret;
+    }
+
+    /*
+     * Lock here to make sure the ref is not changed during find and
+     * write process.
+     */
+    spin_lock(&info->ref_lock);
+
+    /*
+     * Step 2:
+     * Try to find if there is already a COS ID on which all features' values
+     * are same as the array. Then, we can reuse this COS ID.
+     */
+    cos = find_cos(val_array, array_len, feat_type, info);
+    if ( cos >= 0 )
+    {
+        if ( cos == old_cos )
+        {
+            spin_unlock(&info->ref_lock);
+            xfree(val_array);
+            return 0;
+        }
+    }
+    else
+    {
+        /*
+         * Step 3:
+         * If fail to find, we need allocate a new COS ID.
+         * If multiple domains are using same COS ID, its ref is more
+         * than 1. That means we cannot free this COS to make current domain
+         * use it. Because other domains are using the value saved in the COS.
+         * Unless the ref is changed to 1 (mean only current domain is using
+         * it), we cannot allocate the COS ID to current domain.
+         * So, only the COS ID which ref is 1 or 0 can be allocated.
+         */
+        cos = pick_avail_cos(info, val_array, array_len, old_cos, feat_type);
+        if ( cos < 0 )
+        {
+            spin_unlock(&info->ref_lock);
+            xfree(val_array);
+            return cos;
+        }
+
+        /*
+         * Step 4:
+         * Write all features MSRs according to the COS ID.
+         */
+        ret = write_psr_msr(socket, cos, val_array);
+        if ( ret )
+        {
+            spin_unlock(&info->ref_lock);
+            xfree(val_array);
+            return ret;
+        }
+    }
+
+    /*
+     * Step 5:
+     * Update ref according to COS ID.
+     */
+    ref[cos]++;
+    ASSERT(ref[cos] || cos == 0);
+    ref[old_cos]--;
+    spin_unlock(&info->ref_lock);
+
+    /*
+     * Step 6:
+     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
+     * which COS the domain is using on the socket. One domain can only use
+     * one COS ID at same time on each socket.
+     */
+    d->arch.psr_cos_ids[socket] = cos;
+    xfree(val_array);
+
+    return 0;
+}
+
 /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
 static void psr_free_cos(struct domain *d)
 {
-    if( !d->arch.psr_cos_ids )
+    unsigned int socket, cos;
+
+    if ( !d->arch.psr_cos_ids )
         return;
 
+    /* Domain is free so its cos_ref should be decreased. */
+    for ( socket = 0; socket < nr_sockets; socket++ )
+    {
+        struct psr_socket_info *info;
+
+        /* cos 0 is default one which does not need be handled. */
+        if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
+            continue;
+
+        /*
+         * If domain uses other cos ids, all corresponding refs must have been
+         * increased 1 for this domain. So, we need decrease them.
+         */
+        info = socket_info + socket;
+        ASSERT(info->cos_ref[cos] || cos == 0);
+        spin_lock(&info->ref_lock);
+        info->cos_ref[cos]--;
+        spin_unlock(&info->ref_lock);
+    }
+
     xfree(d->arch.psr_cos_ids);
     d->arch.psr_cos_ids = NULL;
 }
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 569e7df..fde7882 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -72,8 +72,8 @@  int psr_get_info(unsigned int socket, enum cbm_type type,
                  uint32_t data[], unsigned int array_len);
 int psr_get_val(struct domain *d, unsigned int socket,
                 uint64_t *val, enum cbm_type type);
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type);
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint64_t val, enum cbm_type type);
 
 int psr_domain_init(struct domain *d);
 void psr_domain_free(struct domain *d);