diff mbox

[1/3] x86: refactor psr implementation in hypervisor.

Message ID 1472102537-7155-1-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun Aug. 25, 2016, 5:22 a.m. UTC
Current psr.c is designed for supporting L3 CAT/CDP. It has many
limitations to add new feature. Considering to support more PSR
features, we need refactor PSR implementation to make it more
flexible and fulfill the principle, open for extension but closed
for modification.

The core of the refactoring is to abstract the common actions and
encapsulate them into "struct feat_ops". The detailed steps to add
a new feature are described at the head of psr.c.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 xen/arch/x86/domctl.c     |   36 +-
 xen/arch/x86/psr.c        | 1121 +++++++++++++++++++++++++++++++++++----------
 xen/arch/x86/sysctl.c     |    9 +-
 xen/include/asm-x86/psr.h |   21 +-
 4 files changed, 912 insertions(+), 275 deletions(-)

Comments

Jan Beulich Sept. 6, 2016, 7:40 a.m. UTC | #1
>>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -23,22 +23,116 @@
>  #define PSR_CAT        (1<<1)
>  #define PSR_CDP        (1<<2)
>  
> -struct psr_cat_cbm {
> -    union {
> -        uint64_t cbm;
> -        struct {
> -            uint64_t code;
> -            uint64_t data;
> -        };
> -    };
> +/*
> + * To add a new PSR feature, please follow below steps:
> + * 1. Implement feature specific callback functions listed in
> + *    "struct feat_ops";
> + * 2. Implement a feature specific "struct feat_ops" object and register
> + *    feature specific callback functions into it;
> + * 3. Delcare feat_list object for the feature and malloc memory for it
> + *    in internal_cpu_prepare(). Correspondingly, free them in
> + *    free_feature();
> + * 4. Add feature initialization codes in internal_cpu_init().
> + */
> +
> +struct feat_list;
> +struct psr_socket_alloc_info;
> +
> +/* Every feature enabled should implement such ops and callback functions. */
> +struct feat_ops {
> +    /*
> +     * init_feature is used in cpu initialization process to do feature
> +     * specific initialization works.
> +     */
> +    void (*init_feature)(unsigned int eax, unsigned int ebx,
> +                         unsigned int ecx, unsigned int edx,
> +                         struct feat_list *pFeat,

Here and below - we don't normally use identifiers with capital letters
in the middle.

> +                         struct psr_socket_alloc_info *info);
> +    /*
> +     * get_old_set_new is used in set value process to get all features'
> +     * COS registers values according to original cos id of the domain.
> +     * Then, assemble them into an mask array as feature list order.

This sentence in particular doesn't make any sense to me. What
follows below also looks like it is in need of improvement.

> +     * Then, set the new feature value into feature corresponding position
> +     * in the array. This function is used to pair with compare_mask.
> +     */
> +    int (*get_old_set_new)(uint64_t *mask,
> +                           struct feat_list *pFeat,
> +                           unsigned int old_cos,
> +                           enum mask_type type,
> +                           uint64_t m);
> +    /*
> +     * compare_mask is used to in set value process to compare if the
> +     * input mask array can match all the features' COS registers values
> +     * according to input cos id.
> +     */
> +    int (*compare_mask)(uint64_t *mask, struct feat_list *pFeat,
> +                        unsigned int cos, bool_t *found);

For a "compare" function I'd expect a word on the meaning of the
return value, and I'd expect most if not all pointer parameters to
be pointers to const.

Also plain bool please.

> +    /*
> +     * get_cos_max_as_type is used to get the cos_max value of the feature
> +     * according to input mask_type.
> +     */
> +    unsigned int (*get_cos_max_as_type)(struct feat_list *pFeat,
> +                                        enum mask_type type);

DYM mean "get_cos_max_from_type()"?

> +    /*
> +     * exceed_range is used to check if the input cos id exceeds the
> +     * feature's cos_max and if the input mask value is not the default one.
> +     * Even if the associated cos exceeds the cos_max, HW can work as default
> +     * value. That is the reason we need check is mask value is default one.
> +     * If both criteria are fulfilled, that means the input exceeds the
> +     * range.
> +     */
> +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> +                                 unsigned int cos);

According to the comment this is kind of a predicate, which seems
unlikely to return an unsigned value. In fact without a word on the
return value I'd expect such to return bool. And I'd also expect the
name to reflect the purpose, i.e. "exceeds_name()". Plus just like
for compare above I wonder whether come or all of the parameters
should be pointers to const (please go over the entire patch and do
constification wherever possible/sensible).

> +    /*
> +     * write_msr is used to write feature specific MSR registers.
> +     */

This is a single line comment.

> +    int (*write_msr)(unsigned int cos, uint64_t *mask,
> +                     struct feat_list *pFeat);

Such write operations don't normally alter the value to be written.
If that's different here, that's perhaps worth a word in the comment.
If it isn't, then I don't see why the address of the value gets passed
instead of the value itself.

Oh, having gone further in the patch I see this is to update multiple
MSRs. I guess this would best be reflected by making the parameter
const uint64_t masks[] (albeit it's also not clear to me why this
necessarily have to be masks kind items, and not just arbitrary
values - proper choice of variable and parameter names helps the
understanding).

> +#define MAX_FEAT_INFO_SIZE 8
> +#define MAX_COS_REG_NUM  128

Are these numbers arbitrary, or based on something?

> +struct psr_socket_alloc_info {

I've yet to see whether the "alloc" in the name really makes sense.

> @@ -58,8 +150,427 @@ static unsigned int __read_mostly opt_cos_max = 255;
>  static uint64_t rmid_mask;
>  static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
>  
> -static struct psr_cat_cbm *temp_cos_to_cbm;
> +static struct psr_ref *temp_cos_ref;
> +/* Every feature has its own object. */
> +struct feat_list *pL3CAT;

static. And a better name please, more in line with the other variable
or similar purpose.

> +/* Common functions for supporting feature callback functions. */
> +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
> +{
> +    if ( NULL == pHead || NULL == pTmp )
> +        return;
> +
> +    while ( pHead->pNext )
> +        pHead = pHead->pNext;
> +
> +    pHead->pNext = pTmp;
> +}

Do you really need custom list management here?

> +static void free_feature(struct psr_socket_alloc_info *info)
> +{
> +    struct feat_list *pTmp;
> +    struct feat_list *pPrev;
> +
> +    if ( NULL == info )
> +        return;
> +
> +    if ( NULL == info->pFeat )
> +        return;
> +
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        pPrev = pTmp;
> +        pTmp = pTmp->pNext;
> +        clear_bit(pPrev->feature, &(info->features));
> +        xfree(pPrev);
> +        pPrev = NULL;
> +    }
> +}

Why does info->pFeat not get freed here?

> +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
> +                                unsigned int ecx, unsigned int edx,
> +                                struct feat_list *pFeat,
> +                                struct psr_socket_alloc_info *info)
> +{
> +    struct psr_cat_lvl_info l3_cat;
> +    unsigned int socket;
> +    uint64_t val;
> +
> +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
> +        return;

DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
array is one with unsigned int as the base type, which doesn't get
reflected here.

> +    /* No valid value so do not enable feature. */
> +    if ( 0 == eax || 0 == edx )
> +        return;
> +
> +    l3_cat.cbm_len = (eax & 0x1f) + 1;
> +    l3_cat.cos_max = min(opt_cos_max, edx & 0xffff);
>  
> +    /* cos=0 is reserved as default cbm(all ones). */
> +    pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> +
> +    pFeat->feature = PSR_SOCKET_L3_CAT;
> +    set_bit(PSR_SOCKET_L3_CAT, &(info->features));

Does this need to be set_bit(), or would __set_bit() suffice? Also
there are many cases of this strange &() construct - the
parentheses are pointless there.

> +    if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> +         !test_bit(PSR_SOCKET_L3_CDP, &(info->features)) )
> +    {
> +        /* Data */
> +        pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> +        /* Code */
> +        pFeat->cos_reg_val[1] = (1ull << l3_cat.cbm_len) - 1;
> +
> +        /* We only write mask1 since mask0 is always all ones by default. */
> +        wrmsrl(MSR_IA32_PSR_L3_MASK(1),
> +               (1ull << l3_cat.cbm_len) - 1);
> +        rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> +        wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
> +
> +        /* Cut half of cos_max when CDP is enabled. */
> +        l3_cat.cos_max >>= 1;
> +
> +        pFeat->feature = PSR_SOCKET_L3_CDP;
> +        set_bit(PSR_SOCKET_L3_CDP, &(info->features));
> +        clear_bit(PSR_SOCKET_L3_CAT, &(info->features));

A few lines up you set this bit, and now you clear it again. Please
instead set it in an else branch below.

> +    }
> +
> +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));

Aforementioned BUILD_BUG_ON() would probably bets be placed
right ahead of this memcpy().

> +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> +                               unsigned int cos, bool_t *found)
> +{
> +    struct psr_cat_lvl_info cat_info;
> +    uint64_t l3_def_cbm;
> +
> +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));

Already here I think this memcpy()ing gets unwieldy. Can't you
simply make the structure field a union of all types of interest?

> +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> +
> +    /* CDP */
> +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> +    {
> +        if ( cos > cat_info.cos_max )
> +        {
> +            if ( mask[0] != l3_def_cbm ||
> +                 mask[1] != l3_def_cbm )
> +            {
> +                *found = 0;

false and true for literal boolean values please.

> +                printk(XENLOG_ERR "CDP exceed cos max.\n");

I haven't figured yet at what times this function will get called, but
I suspect it's inappropriate for it to log messages.

> +                return -ENOENT;
> +            }
> +            *found = 1;
> +            return 2;

Who or what is 2?

> +        }
> +
> +        if ( mask[0] == pFeat->cos_reg_val[cos * 2] &&
> +             mask[1] == pFeat->cos_reg_val[cos * 2 + 1])
> +            *found = 1;
> +        else
> +            *found = 0;
> +
> +        return 2;
> +    }
> +
> +    /* CAT */
> +    if ( cos > cat_info.cos_max )
> +    {
> +        if ( mask[0] != l3_def_cbm )
> +        {
> +            *found = 0;
> +            printk(XENLOG_ERR "CAT exceed cos max.\n");
> +            return -ENOENT;
> +        }
> +        *found = 1;
> +        return 1;
> +    }
> +
> +    if ( mask[0] == pFeat->cos_reg_val[cos] )
> +        *found = 1;
> +    else
> +        *found = 0;

Please use a simple assignment in cases like this, instead of if/else.

> +static unsigned int l3_cat_exceed_range(uint64_t *mask, struct feat_list *pFeat,
> +                                        unsigned int cos)
> +{
> +    struct psr_cat_lvl_info cat_info;
> +    uint64_t l3_def_cbm;
> +
> +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> +
> +    /* CDP */
> +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> +    {
> +        if ( cos > cat_info.cos_max )
> +            if ( mask[0] != l3_def_cbm ||
> +                 mask[1] != l3_def_cbm )

Please combine such if()s.

> +                /*
> +                 * Exceed cos_max and value to set is not default,
> +                 * return error.
> +                 */
> +                return 0;
> +
> +        return 2;

Same question as above - who or what is 2?

> +static int l3_cat_get_old_set_new(uint64_t *mask,
> +                                  struct feat_list *pFeat,
> +                                  unsigned int old_cos,
> +                                  enum mask_type type,
> +                                  uint64_t m)
> +{
> +    struct psr_cat_lvl_info cat_info;
> +
> +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> +
> +    if ( type == PSR_MASK_TYPE_L3_DATA ||
> +         type == PSR_MASK_TYPE_L3_CODE ||
> +         type == PSR_MASK_TYPE_L3_CBM )
> +    {
> +        if ( !psr_check_cbm(cat_info.cbm_len, m) )
> +            return -EINVAL;
> +    }
> +
> +    if ( ( type == PSR_MASK_TYPE_L3_DATA ||
> +         type == PSR_MASK_TYPE_L3_CODE ) &&
> +         pFeat->feature != PSR_SOCKET_L3_CDP )
> +        return -ENXIO;

Please either combine the two if()s into a single switch() (with
appropriate fall through behavior) or, perhaps better, move
this second if() - suitably shrunk - past the following one. This is
to avoid redundant code, which makes things more difficult to
follow.

> +    /* CDP */
> +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> +    {
> +        if ( old_cos > cat_info.cos_max )
> +        {
> +            /* Data */
> +            mask[0] = pFeat->cos_reg_val[0];
> +            /* Code */
> +            mask[1] = pFeat->cos_reg_val[1];
> +        }
> +        else
> +        {
> +            /* Data */
> +            mask[0] = pFeat->cos_reg_val[old_cos * 2];
> +            /* Code */
> +            mask[1] = pFeat->cos_reg_val[old_cos * 2 + 1];
> +        }
> +
> +        /* Set new mask */
> +        if ( type == PSR_MASK_TYPE_L3_DATA )
> +            mask[0] = m;
> +        if ( type == PSR_MASK_TYPE_L3_CODE )
> +            mask[1] = m;
> +
> +        return 2;
> +    }
> +
> +    /* CAT */
> +    if ( old_cos > cat_info.cos_max )
> +        mask[0] =  pFeat->cos_reg_val[0];
> +    else
> +        mask[0] =  pFeat->cos_reg_val[old_cos];

Across the entire function - wouldn't it be easier to range check
old_cos once up front, forcing it to zero if the check fails? That
would reduce code size to a better readable amount. (And just to
avoid any misunderstanding - comments like this also may apply
to other functions; I'm not going to repeat them there.)

Also note that there are undue double blanks in here.

> +    if ( type == PSR_MASK_TYPE_L3_CBM )
> +        mask[0] = m;

This overwriting behavior also looks quite strange to me. What's
the purpose? And if this really is meant to be that way, why is
this not (leaving aside the other suggested adjustment)

    if ( type == PSR_MASK_TYPE_L3_CBM )
        mask[0] = m;
    else if ( old_cos > cat_info.cos_max )
        mask[0] = pFeat->cos_reg_val[0];
    else
        mask[0] = pFeat->cos_reg_val[old_cos];

?

> +static int l3_cat_get_feat_info(struct feat_list *pFeat, enum mask_type type,
> +                                uint32_t *dat0, uint32_t *dat1,
> +                                uint32_t *dat2)
> +{
> +    struct psr_cat_lvl_info cat_info;
> +
> +    if ( type != PSR_MASK_TYPE_L3_DATA &&
> +         type != PSR_MASK_TYPE_L3_CODE &&
> +         type != PSR_MASK_TYPE_L3_CBM )
> +        return 0;
> +
> +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> +
> +    *dat0 = cat_info.cbm_len;
> +    *dat1 = cat_info.cos_max;
> +
> +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> +        *dat2 |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> +    else
> +        *dat2 = 0;
> +
> +    return 1;
> +}

The latest here it becomes clear that you want to use switch()
much more widely.

>  static inline void psr_assoc_init(void)
>  {
>      struct psr_assoc *psra = &this_cpu(psr_assoc);
> +    struct psr_socket_alloc_info *info;
> +    unsigned int cos_max;
> +    unsigned int socket;
>  
> -    if ( cat_socket_info )
> +    if ( socket_alloc_info )
>      {
> -        unsigned int socket = cpu_to_socket(smp_processor_id());
> +        socket = cpu_to_socket(smp_processor_id());

What's the point of moving the variable declaration? Quite the
other way around, the new variable you add could - afaict - all
live in this more narrow scope. And info could presumably become
a pointer to const.

> -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
> -                        uint32_t *cos_max, uint32_t *flags)
> +int psr_get_info(unsigned int socket, enum mask_type type,
> +                 uint32_t *dat0, uint32_t *dat1,
> +                 uint32_t *dat2)
>  {
> -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> +    struct feat_list *pTmp;
>  
>      if ( IS_ERR(info) )
>          return PTR_ERR(info);
>  
> -    *cbm_len = info->cbm_len;
> -    *cos_max = info->cos_max;
> +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
> +       return -ENODEV;
>  
> -    *flags = 0;
> -    if ( cdp_is_enabled(socket) )
> -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )

Considering that this is one of the generic ops: What guarantees
that 3 elements will suffice going forward? I.e. why is this not an
array?

> -int psr_get_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t *cbm, enum cbm_type type)
> +static int get_old_set_new(uint64_t *mask,
> +                           struct psr_socket_alloc_info *info,
> +                           unsigned int old_cos,
> +                           enum mask_type type,
> +                           uint64_t m)
>  {
> -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> -    bool_t cdp_enabled = cdp_is_enabled(socket);
> -
> -    if ( IS_ERR(info) )
> -        return PTR_ERR(info);
> +    struct feat_list *pTmp;
> +    int ret;
>  
> -    switch ( type )
> -    {
> -    case PSR_CBM_TYPE_L3:
> -        if ( cdp_enabled )
> -            return -EXDEV;
> -        *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> -        break;
> +    if ( !mask )
> +        return -EINVAL;
>  
> -    case PSR_CBM_TYPE_L3_CODE:
> -        if ( !cdp_enabled )
> -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> -        else
> -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].code;
> -        break;
> +    if ( !info || !info->pFeat )
> +        return -ENODEV;

Earlier on you used (slightly oddly arranged) checks against NULL.
Please be consistent (and I'd prefer if you used ! across the board).

> -    case PSR_CBM_TYPE_L3_DATA:
> -        if ( !cdp_enabled )
> -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> -        else
> -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
> -        break;
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        /* mask getting and setting order is same as feature list */
> +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
> +        if ( ret < 0 )
> +            return ret;
>  
> -    default:
> -        ASSERT_UNREACHABLE();
> +        mask += ret;

With this - what ensures the array isn't going to be overrun?

And I'm also having some difficulty seeing how an array of outputs
matches up with exactly one input (m). Can you explain this please?

> -static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
> +int psr_get_val(struct domain *d, unsigned int socket,
> +                uint64_t *val, enum mask_type type)
>  {
> -    unsigned int first_bit, zero_bit;
> -
> -    /* Set bits should only in the range of [0, cbm_len). */
> -    if ( cbm & (~0ull << cbm_len) )
> -        return 0;
> +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> +    unsigned int cos = d->arch.psr_cos_ids[socket];
> +    struct feat_list *pTmp;
> +    int ret;
>  
> -    /* At least one bit need to be set. */
> -    if ( cbm == 0 )
> -        return 0;
> +    if ( IS_ERR(info) )
> +        return PTR_ERR(info);
>  
> -    first_bit = find_first_bit(&cbm, cbm_len);
> -    zero_bit = find_next_zero_bit(&cbm, cbm_len, first_bit);
> +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext )
> +        return -ENODEV;
>  
> -    /* Set bits should be contiguous. */
> -    if ( zero_bit < cbm_len &&
> -         find_next_bit(&cbm, cbm_len, zero_bit) < cbm_len )
> -        return 0;
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        ret = pTmp->ops.get_val(pTmp, cos, type, val);
> +        if ( ret < 0 )
> +            return -EINVAL;

Why don't you return "ret" here?

> +        else if ( ret > 0 )

Pointless "else".

> -static void do_write_l3_cbm(void *data)
> +static void do_write_psr_ref(void *data)

What does "_ref" stand for?

> -static int find_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> -                    uint64_t cbm_code, uint64_t cbm_data, bool_t cdp_enabled)
> +static int find_cos(uint64_t *mask, enum mask_type type,
> +                    struct psr_socket_alloc_info *info)
>  {
>      unsigned int cos;
> +    struct psr_ref *map = info->cos_ref;
> +    struct feat_list *pTmp;
> +    uint64_t *pMask = mask;
> +    int ret;
> +    bool_t found = 0;
> +    unsigned int cos_max = 0;
> +
> +    /* cos_max is the max COS of the feature to be set. */
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> +        if ( cos_max > 0 )
> +            break;
> +
> +        pTmp = pTmp->pNext;
> +    }
>  
> +    pTmp = info->pFeat->pNext;
>      for ( cos = 0; cos <= cos_max; cos++ )
>      {
> -        if ( (map[cos].ref || cos == 0) &&
> -             ((!cdp_enabled && map[cos].cbm == cbm_code) ||
> -              (cdp_enabled && map[cos].code == cbm_code &&
> -                              map[cos].data == cbm_data)) )
> +        if ( 0 != cos && 0 == map[cos].ref )
> +            continue;
> +
> +        while ( pTmp )
> +        {
> +            /*
> +             * Compare mask according to feature list order.
> +             * We must follow this order because mask is assembled
> +             * as this order in get_old_set_new().
> +             */
> +            ret = pTmp->ops.compare_mask(pMask, pTmp, cos, &found);
> +            if ( ret < 0 )
> +                return ret;
> +
> +            /* If fail to match, go to next cos to compare. */
> +            if ( !found )
> +                break;
> +
> +            pMask += ret;
> +            pTmp = pTmp->pNext;
> +        }
> +
> +        /* Every feature can match, this cos is what we find. */
> +        if ( found )
>              return cos;
> +
> +        /* Not found, need find again from beginning. */
> +        pTmp = info->pFeat->pNext;
> +        pMask = mask;

Please do once (at the appropriate place inside the outer loop) what
you have done prior to the loop and repeat here. As already alluded
to - limiting the scope of variables also helps ensuring that things
work as intended.

>      }
>  
> +    printk(XENLOG_INFO "%s: cannot find cos.\n", __func__);

Again a questionable log message with a questionable use of __func__.
In any case log messages should normally not have a full stop at their
end.

> -static int pick_avail_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> -                          unsigned int old_cos)
> +static int check_exceed_range(uint64_t *mask, struct feat_list *pTmp,
> +                              unsigned int cos)
>  {
> +    unsigned int ret = 0;

Pointless initializer.

> +static int alloc_new_cos(struct psr_socket_alloc_info *info,
> +                         uint64_t *mask, unsigned int old_cos,
> +                         enum mask_type type)
> +{
> +    unsigned int cos_max = 0;
> +    struct feat_list *pTmp;
>      unsigned int cos;
> +    struct psr_ref *map = info->cos_ref;
> +
> +    /*
> +     * cos_max is the max COS of the feature to be set.
> +     */
> +    pTmp = info->pFeat->pNext;
> +    while ( pTmp )
> +    {
> +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> +        if ( cos_max > 0 )
> +            break;
> +
> +        pTmp = pTmp->pNext;
> +    }
> +
> +    if ( !cos_max )
> +        return -ENOENT;
>  
>      /* If old cos is referred only by the domain, then use it. */
>      if ( map[old_cos].ref == 1 && old_cos != 0 )
> +    {
> +        pTmp = info->pFeat->pNext;
> +        if ( check_exceed_range(mask, pTmp, old_cos) )
> +            goto find_new;
> +
>          return old_cos;
> +    }
>  
> +find_new:

Labels indented by at least one blank please. In this case, though,
it looks like you could easily do without any label and goto.

> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t cbm, enum cbm_type type)
> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint64_t val, enum mask_type type)
>  {
> -    unsigned int old_cos, cos_max;
> +    unsigned int old_cos;
>      int cos, ret;
> -    uint64_t cbm_data, cbm_code;
> -    bool_t cdp_enabled = cdp_is_enabled(socket);
> -    struct psr_cat_cbm *map;
> -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> +    struct psr_ref *map;
> +    uint64_t *mask;
> +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
>  
>      if ( IS_ERR(info) )
>          return PTR_ERR(info);
>  
> -    if ( !psr_check_cbm(info->cbm_len, cbm) )
> -        return -EINVAL;
> -
> -    if ( !cdp_enabled && (type == PSR_CBM_TYPE_L3_CODE ||
> -                          type == PSR_CBM_TYPE_L3_DATA) )
> -        return -ENXIO;
> -
> -    cos_max = info->cos_max;
> +    map = info->cos_ref;
>      old_cos = d->arch.psr_cos_ids[socket];
> -    map = info->cos_to_cbm;
> -
> -    switch ( type )
> -    {
> -    case PSR_CBM_TYPE_L3:
> -        cbm_code = cbm;
> -        cbm_data = cbm;
> -        break;
> -
> -    case PSR_CBM_TYPE_L3_CODE:
> -        cbm_code = cbm;
> -        cbm_data = map[old_cos].data;
> -        break;
> -
> -    case PSR_CBM_TYPE_L3_DATA:
> -        cbm_code = map[old_cos].code;
> -        cbm_data = cbm;
> -        break;
>  
> -    default:
> -        ASSERT_UNREACHABLE();
> -        return -EINVAL;
> -    }
> +    /* Considering CDP liking features */

DYM "like"? And what do you want to say with this comment?

> +    mask = xzalloc_array(uint64_t, info->nr_feat * 2);

Perhaps it's related to the doubling of nr_feat here? If so - what
precludes another future feature to require 3 or 4 slots per element?
I think this needs abstracting out, now that you create an abstract
framework.

> +    if ( (ret = get_old_set_new(mask, info, old_cos, type, val)) != 0 )

What if the xzalloc_array() returned NULL?

> +        return ret;
>  
> -    spin_lock(&info->cbm_lock);
> -    cos = find_cos(map, cos_max, cbm_code, cbm_data, cdp_enabled);
> +    spin_lock(&info->mask_lock);
> +    cos = find_cos(mask, type, info);
>      if ( cos >= 0 )
>      {
>          if ( cos == old_cos )
>          {
> -            spin_unlock(&info->cbm_lock);
> +            spin_unlock(&info->mask_lock);
> +            xfree(mask);
>              return 0;
>          }
>      }
>      else
>      {
> -        cos = pick_avail_cos(map, cos_max, old_cos);
> +        cos = alloc_new_cos(info, mask, old_cos, type);
>          if ( cos < 0 )
>          {
> -            spin_unlock(&info->cbm_lock);
> +            spin_unlock(&info->mask_lock);
> +            xfree(mask);
>              return cos;
>          }
>  
> -        /* We try to avoid writing MSR. */
> -        if ( (cdp_enabled &&
> -             (map[cos].code != cbm_code || map[cos].data != cbm_data)) ||
> -             (!cdp_enabled && map[cos].cbm != cbm_code) )
> +        /* Write all features mask MSRs corresponding to the COS */
> +        ret = write_psr_ref(socket, cos, mask);
> +        if ( ret )
>          {
> -            ret = write_l3_cbm(socket, cos, cbm_code, cbm_data, cdp_enabled);
> -            if ( ret )
> -            {
> -                spin_unlock(&info->cbm_lock);
> -                return ret;
> -            }
> -            map[cos].code = cbm_code;
> -            map[cos].data = cbm_data;
> +            spin_unlock(&info->mask_lock);
> +            xfree(mask);
> +            return ret;
>          }
>      }
>  
>      map[cos].ref++;
>      map[old_cos].ref--;
> -    spin_unlock(&info->cbm_lock);
> +
> +    spin_unlock(&info->mask_lock);
>  
>      d->arch.psr_cos_ids[socket] = cos;
> +    xfree(mask);
> +    mask = NULL;
>  
>      return 0;

I don't think setting mask to NULL right ahead of a return statement
is very helpful.

> -static void cat_cpu_init(void)
> +static void internal_cpu_init(void)

Is "internal_" really a useful prefix here?

>  {
>      unsigned int eax, ebx, ecx, edx;
> -    struct psr_cat_socket_info *info;
> +    struct psr_socket_alloc_info *info;
>      unsigned int socket;
>      unsigned int cpu = smp_processor_id();
> -    uint64_t val;
>      const struct cpuinfo_x86 *c = cpu_data + cpu;
> +    struct feat_list *pTmp;
>  
>      if ( !cpu_has(c, X86_FEATURE_PQE) || c->cpuid_level < PSR_CPUID_LEVEL_CAT )
>          return;
>  
>      socket = cpu_to_socket(cpu);
> -    if ( test_bit(socket, cat_socket_enable) )
> +    info = socket_alloc_info + socket;
> +    if ( info->features )
>          return;
>  
>      cpuid_count(PSR_CPUID_LEVEL_CAT, 0, &eax, &ebx, &ecx, &edx);
>      if ( ebx & PSR_RESOURCE_TYPE_L3 )
>      {
> -        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> -        info = cat_socket_info + socket;
> -        info->cbm_len = (eax & 0x1f) + 1;
> -        info->cos_max = min(opt_cos_max, edx & 0xffff);
> -
> -        info->cos_to_cbm = temp_cos_to_cbm;
> -        temp_cos_to_cbm = NULL;
> -        /* cos=0 is reserved as default cbm(all ones). */
> -        info->cos_to_cbm[0].cbm = (1ull << info->cbm_len) - 1;
> +        pTmp = pL3CAT;
> +        if ( !pTmp )
> +            return;

You check the allocation result in internal_cpu_prepare() - can you
really get here with this being NULL? If you can't, drop the if() (or
make it an ASSERT()).

> +        pL3CAT = NULL;
>  
> -        spin_lock_init(&info->cbm_lock);
> -
> -        set_bit(socket, cat_socket_enable);
> -
> -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
> -        {
> -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
> -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
> -
> -            /* We only write mask1 since mask0 is always all ones by default. */
> -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
> -
> -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
> -
> -            /* Cut half of cos_max when CDP is enabled. */
> -            info->cos_max >>= 1;
> -
> -            set_bit(socket, cdp_socket_enable);
> -        }
> -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
> -               socket, info->cos_max, info->cbm_len,
> -               cdp_is_enabled(socket) ? "on" : "off");
> +        /* Initialize this feature according to CPUID. */
> +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> +        pTmp->ops = l3_cat_ops;
> +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);

This function can return without having consumed pTmp, which means
you'd leak the memory in that case.

> -static void __init psr_cat_free(void)
> +static void __init psr_free(void)
>  {
> -    xfree(cat_socket_enable);
> -    cat_socket_enable = NULL;
> -    xfree(cat_socket_info);
> -    cat_socket_info = NULL;
> +    int i;

unsigned int

> -static void __init init_psr_cat(void)
> +static void __init init_psr(void)
>  {
> +    int i;

Again.

> +
>      if ( opt_cos_max < 1 )
>      {
>          printk(XENLOG_INFO "CAT: disabled, cos_max is too small\n");
>          return;
>      }
>  
> -    cat_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
> -    cat_socket_info = xzalloc_array(struct psr_cat_socket_info, nr_sockets);
> -    cdp_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
> +    socket_alloc_info = xzalloc_array(struct psr_socket_alloc_info, nr_sockets);
> +
> +    if ( !socket_alloc_info )
> +    {
> +        printk(XENLOG_INFO "Fail to alloc socket_alloc_info!\n");
> +        return;
> +    }
>  
> -    if ( !cat_socket_enable || !cat_socket_info )
> -        psr_cat_free();
> +    for ( i = 0; i < nr_sockets; i++ )
> +    {
> +        socket_alloc_info[i].pFeat =  xzalloc(struct feat_list);

What's the point of this dummy list element?

> +        if ( NULL == socket_alloc_info[i].pFeat )
> +        {
> +            printk(XENLOG_INFO "Fail to alloc pFeat!\n");

How would the reader of the log easily associate such a message
with something having gone wrong in PSR initialization?

> +            return;
> +        }
> +        socket_alloc_info[i].pFeat->pNext = NULL;

This is redundant with the use of xzalloc().

>  static void psr_cpu_init(void)
>  {
> -    if ( cat_socket_info )
> -        cat_cpu_init();
> +    unsigned int socket;
> +    struct psr_socket_alloc_info *info;
> +
> +    if ( socket_alloc_info )
> +    {
> +        socket = cpu_to_socket(smp_processor_id());
> +        info = socket_alloc_info + socket;
> +        info->cos_ref = temp_cos_ref;
> +        temp_cos_ref = NULL;
> +        spin_lock_init(&info->mask_lock);
> +
> +        internal_cpu_init();

Why is everything ahead of this call not part of that function?

> --- a/xen/include/asm-x86/psr.h
> +++ b/xen/include/asm-x86/psr.h
> @@ -46,10 +46,10 @@ struct psr_cmt {
>      struct psr_cmt_l3 l3;
>  };
>  
> -enum cbm_type {
> -    PSR_CBM_TYPE_L3,
> -    PSR_CBM_TYPE_L3_CODE,
> -    PSR_CBM_TYPE_L3_DATA,
> +enum mask_type {
> +    PSR_MASK_TYPE_L3_CBM,
> +    PSR_MASK_TYPE_L3_CODE,
> +    PSR_MASK_TYPE_L3_DATA,
>  };

The enum tag is now too generic a name - please make this
psr_mask_type or something similar.

Jan
Yi Sun Sept. 7, 2016, 7:12 a.m. UTC | #2
Hi, Jan,

Thank you very much for reviewing my patches in details! Please
check my comments below.

On 16-09-06 01:40:00, Jan Beulich wrote:
> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -23,22 +23,116 @@
> >  #define PSR_CAT        (1<<1)
> >  #define PSR_CDP        (1<<2)
> >  
> > -struct psr_cat_cbm {
> > -    union {
> > -        uint64_t cbm;
> > -        struct {
> > -            uint64_t code;
> > -            uint64_t data;
> > -        };
> > -    };
> > +/*
> > + * To add a new PSR feature, please follow below steps:
> > + * 1. Implement feature specific callback functions listed in
> > + *    "struct feat_ops";
> > + * 2. Implement a feature specific "struct feat_ops" object and register
> > + *    feature specific callback functions into it;
> > + * 3. Delcare feat_list object for the feature and malloc memory for it
> > + *    in internal_cpu_prepare(). Correspondingly, free them in
> > + *    free_feature();
> > + * 4. Add feature initialization codes in internal_cpu_init().
> > + */
> > +
> > +struct feat_list;
> > +struct psr_socket_alloc_info;
> > +
> > +/* Every feature enabled should implement such ops and callback functions. */
> > +struct feat_ops {
> > +    /*
> > +     * init_feature is used in cpu initialization process to do feature
> > +     * specific initialization works.
> > +     */
> > +    void (*init_feature)(unsigned int eax, unsigned int ebx,
> > +                         unsigned int ecx, unsigned int edx,
> > +                         struct feat_list *pFeat,
> 
> Here and below - we don't normally use identifiers with capital letters
> in the middle.
> 
Thanks! I will fix it.

> > +                         struct psr_socket_alloc_info *info);
> > +    /*
> > +     * get_old_set_new is used in set value process to get all features'
> > +     * COS registers values according to original cos id of the domain.
> > +     * Then, assemble them into an mask array as feature list order.
> 
> This sentence in particular doesn't make any sense to me. What
> follows below also looks like it is in need of improvement.
> 
Do you mean the comments are not accurate? How about below description?

get_old_set_new will traverse all features in list. It is used to do below two
things:
1. get old_cos register value of all supported features and
2. set the new value for appointed feature.

All the values are set into mask array according the traversal order, meaning
the same order of feature list members.

For example, old_cos of the domain is 1. User wants to set L3 CAT CBM to 0x1ff.
The original COS registers like below.

        -----------------------
        | COS 0 | COS 1 | ... |
        -----------------------
L3 CAT  | 0x7ff | 0x3ff | ... |
        -----------------------
L2 CAT  | 0xff  | 0x3f  | ... |
        -----------------------

Then, mask array should be:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask().

> > +     * Then, set the new feature value into feature corresponding position
> > +     * in the array. This function is used to pair with compare_mask.
> > +     */
> > +    int (*get_old_set_new)(uint64_t *mask,
> > +                           struct feat_list *pFeat,
> > +                           unsigned int old_cos,
> > +                           enum mask_type type,
> > +                           uint64_t m);
> > +    /*
> > +     * compare_mask is used to in set value process to compare if the
> > +     * input mask array can match all the features' COS registers values
> > +     * according to input cos id.
> > +     */
> > +    int (*compare_mask)(uint64_t *mask, struct feat_list *pFeat,
> > +                        unsigned int cos, bool_t *found);
> 
> For a "compare" function I'd expect a word on the meaning of the
> return value, and I'd expect most if not all pointer parameters to
> be pointers to const.
>
Thanks! I will explain the return value.
 
> Also plain bool please.
> 
Thanks! I will fix it.

> > +    /*
> > +     * get_cos_max_as_type is used to get the cos_max value of the feature
> > +     * according to input mask_type.
> > +     */
> > +    unsigned int (*get_cos_max_as_type)(struct feat_list *pFeat,
> > +                                        enum mask_type type);
> 
> DYM mean "get_cos_max_from_type()"?
> 
Yes, correct. I will revise the function name.

> > +    /*
> > +     * exceed_range is used to check if the input cos id exceeds the
> > +     * feature's cos_max and if the input mask value is not the default one.
> > +     * Even if the associated cos exceeds the cos_max, HW can work as default
> > +     * value. That is the reason we need check is mask value is default one.
> > +     * If both criteria are fulfilled, that means the input exceeds the
> > +     * range.
> > +     */
> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> > +                                 unsigned int cos);
> 
> According to the comment this is kind of a predicate, which seems
> unlikely to return an unsigned value. In fact without a word on the
> return value I'd expect such to return bool. And I'd also expect the
> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> for compare above I wonder whether come or all of the parameters
> should be pointers to const (please go over the entire patch and do
> constification wherever possible/sensible).
> 
Yes, you are right. I will change the function type to bool and add const
for not changed input pointers.

This function is used to check if the input cos id exceeds the cos_max. If yes
and the set value is not default value, we should return error. So, I think
to change the function name to exceed_cos_max(). How do you think?

> > +    /*
> > +     * write_msr is used to write feature specific MSR registers.
> > +     */
> 
> This is a single line comment.
> 
Yes, will change the format. Thanks!

> > +    int (*write_msr)(unsigned int cos, uint64_t *mask,
> > +                     struct feat_list *pFeat);
> 
> Such write operations don't normally alter the value to be written.
> If that's different here, that's perhaps worth a word in the comment.
> If it isn't, then I don't see why the address of the value gets passed
> instead of the value itself.
> 
> Oh, having gone further in the patch I see this is to update multiple
> MSRs. I guess this would best be reflected by making the parameter
> const uint64_t masks[] (albeit it's also not clear to me why this
> necessarily have to be masks kind items, and not just arbitrary
> values - proper choice of variable and parameter names helps the
> understanding).
> 
Thanks for your suggestion! Will change it to uint64_t val[].

> > +#define MAX_FEAT_INFO_SIZE 8
> > +#define MAX_COS_REG_NUM  128
> 
> Are these numbers arbitrary, or based on something?
> 
MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
consider the extension for future feature.

MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
for all PSR features so far.

> > +struct psr_socket_alloc_info {
> 
> I've yet to see whether the "alloc" in the name really makes sense.
>
Thanks! Will change it.
 
> > @@ -58,8 +150,427 @@ static unsigned int __read_mostly opt_cos_max = 255;
> >  static uint64_t rmid_mask;
> >  static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
> >  
> > -static struct psr_cat_cbm *temp_cos_to_cbm;
> > +static struct psr_ref *temp_cos_ref;
> > +/* Every feature has its own object. */
> > +struct feat_list *pL3CAT;
> 
> static. And a better name please, more in line with the other variable
> or similar purpose.
> 
Thanks! Will modify it.

> > +/* Common functions for supporting feature callback functions. */
> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
> > +{
> > +    if ( NULL == pHead || NULL == pTmp )
> > +        return;
> > +
> > +    while ( pHead->pNext )
> > +        pHead = pHead->pNext;
> > +
> > +    pHead->pNext = pTmp;
> > +}
> 
> Do you really need custom list management here?
> 
It seems xen list interfaces require the input list be a double linked list but
my list is a single linked list. Furthermore, I only need simple add to tail
function and free function. So I create custom list management functions.

> > +static void free_feature(struct psr_socket_alloc_info *info)
> > +{
> > +    struct feat_list *pTmp;
> > +    struct feat_list *pPrev;
> > +
> > +    if ( NULL == info )
> > +        return;
> > +
> > +    if ( NULL == info->pFeat )
> > +        return;
> > +
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        pPrev = pTmp;
> > +        pTmp = pTmp->pNext;
> > +        clear_bit(pPrev->feature, &(info->features));
> > +        xfree(pPrev);
> > +        pPrev = NULL;
> > +    }
> > +}
> 
> Why does info->pFeat not get freed here?
> 
The info->pFeat is a redundant list head to facilitate list operations.
It is only freed when socket info is freed.

With this list head, the advantage is that we can insert/delete the first
element same as others without special operations.

> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
> > +                                unsigned int ecx, unsigned int edx,
> > +                                struct feat_list *pFeat,
> > +                                struct psr_socket_alloc_info *info)
> > +{
> > +    struct psr_cat_lvl_info l3_cat;
> > +    unsigned int socket;
> > +    uint64_t val;
> > +
> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
> > +        return;
> 
> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
> array is one with unsigned int as the base type, which doesn't get
> reflected here.
> 
I will use BUILD_BUG_ON() for the error case.

As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
should be good.

> > +    /* No valid value so do not enable feature. */
> > +    if ( 0 == eax || 0 == edx )
> > +        return;
> > +
> > +    l3_cat.cbm_len = (eax & 0x1f) + 1;
> > +    l3_cat.cos_max = min(opt_cos_max, edx & 0xffff);
> >  
> > +    /* cos=0 is reserved as default cbm(all ones). */
> > +    pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> > +
> > +    pFeat->feature = PSR_SOCKET_L3_CAT;
> > +    set_bit(PSR_SOCKET_L3_CAT, &(info->features));
> 
> Does this need to be set_bit(), or would __set_bit() suffice? Also
> there are many cases of this strange &() construct - the
> parentheses are pointless there.
> 
This is inherited from original codes. After checking codes, I think this can
be replaced by non-atomic operation __set_bit(). Thanks!

Will remove unnecessary parentheses.

> > +    if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> > +         !test_bit(PSR_SOCKET_L3_CDP, &(info->features)) )
> > +    {
> > +        /* Data */
> > +        pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
> > +        /* Code */
> > +        pFeat->cos_reg_val[1] = (1ull << l3_cat.cbm_len) - 1;
> > +
> > +        /* We only write mask1 since mask0 is always all ones by default. */
> > +        wrmsrl(MSR_IA32_PSR_L3_MASK(1),
> > +               (1ull << l3_cat.cbm_len) - 1);
> > +        rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> > +        wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
> > +
> > +        /* Cut half of cos_max when CDP is enabled. */
> > +        l3_cat.cos_max >>= 1;
> > +
> > +        pFeat->feature = PSR_SOCKET_L3_CDP;
> > +        set_bit(PSR_SOCKET_L3_CDP, &(info->features));
> > +        clear_bit(PSR_SOCKET_L3_CAT, &(info->features));
> 
> A few lines up you set this bit, and now you clear it again. Please
> instead set it in an else branch below.
> 
Ok, thanks for the suggestion!

> > +    }
> > +
> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
> 
> Aforementioned BUILD_BUG_ON() would probably bets be placed
> right ahead of this memcpy().
> 
Can we call BUILD_BUG_ON() at the top of the function to avoid executing above
process if the size is not right?

> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> > +                               unsigned int cos, bool_t *found)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +    uint64_t l3_def_cbm;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> 
> Already here I think this memcpy()ing gets unwieldy. Can't you
> simply make the structure field a union of all types of interest?
> 
Sorry that I am not very clear about your meaning to make a union. Do you mean
make feat_info a union? If so, it will lose the universality to cover all
features. Future feature may have different info.

I think I can replace the memcpy() to directly assign value to cat_info.

> > +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> > +
> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( cos > cat_info.cos_max )
> > +        {
> > +            if ( mask[0] != l3_def_cbm ||
> > +                 mask[1] != l3_def_cbm )
> > +            {
> > +                *found = 0;
> 
> false and true for literal boolean values please.
> 
Thanks! Will do it.

> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
> 
> I haven't figured yet at what times this function will get called, but
> I suspect it's inappropriate for it to log messages.
>
Ok, will remove the log. Thanks! 

> > +                return -ENOENT;
> > +            }
> > +            *found = 1;
> > +            return 2;
> 
> Who or what is 2?
> 
CDP occupies 2 members in mask[] array. So, it returns 2 to make codes outside
the function to skip to next feature's mask value.

> > +        }
> > +
> > +        if ( mask[0] == pFeat->cos_reg_val[cos * 2] &&
> > +             mask[1] == pFeat->cos_reg_val[cos * 2 + 1])
> > +            *found = 1;
> > +        else
> > +            *found = 0;
> > +
> > +        return 2;
> > +    }
> > +
> > +    /* CAT */
> > +    if ( cos > cat_info.cos_max )
> > +    {
> > +        if ( mask[0] != l3_def_cbm )
> > +        {
> > +            *found = 0;
> > +            printk(XENLOG_ERR "CAT exceed cos max.\n");
> > +            return -ENOENT;
> > +        }
> > +        *found = 1;
> > +        return 1;
> > +    }
> > +
> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
> > +        *found = 1;
> > +    else
> > +        *found = 0;
> 
> Please use a simple assignment in cases like this, instead of if/else.
> 
Do you mean codes like below?

*found = 0;
if ( mask[0] == pFeat->cos_reg_val[cos] )
    *found = 1;

> > +static unsigned int l3_cat_exceed_range(uint64_t *mask, struct feat_list *pFeat,
> > +                                        unsigned int cos)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +    uint64_t l3_def_cbm;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
> > +
> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( cos > cat_info.cos_max )
> > +            if ( mask[0] != l3_def_cbm ||
> > +                 mask[1] != l3_def_cbm )
> 
> Please combine such if()s.
> 
Sure. Thanks!

> > +                /*
> > +                 * Exceed cos_max and value to set is not default,
> > +                 * return error.
> > +                 */
> > +                return 0;
> > +
> > +        return 2;
> 
> Same question as above - who or what is 2?
> 
Please check above comment.

> > +static int l3_cat_get_old_set_new(uint64_t *mask,
> > +                                  struct feat_list *pFeat,
> > +                                  unsigned int old_cos,
> > +                                  enum mask_type type,
> > +                                  uint64_t m)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +
> > +    if ( type == PSR_MASK_TYPE_L3_DATA ||
> > +         type == PSR_MASK_TYPE_L3_CODE ||
> > +         type == PSR_MASK_TYPE_L3_CBM )
> > +    {
> > +        if ( !psr_check_cbm(cat_info.cbm_len, m) )
> > +            return -EINVAL;
> > +    }
> > +
> > +    if ( ( type == PSR_MASK_TYPE_L3_DATA ||
> > +         type == PSR_MASK_TYPE_L3_CODE ) &&
> > +         pFeat->feature != PSR_SOCKET_L3_CDP )
> > +        return -ENXIO;
> 
> Please either combine the two if()s into a single switch() (with
> appropriate fall through behavior) or, perhaps better, move
> this second if() - suitably shrunk - past the following one. This is
> to avoid redundant code, which makes things more difficult to
> follow.
> 
Thanks! Will make modification.

> > +    /* CDP */
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +    {
> > +        if ( old_cos > cat_info.cos_max )
> > +        {
> > +            /* Data */
> > +            mask[0] = pFeat->cos_reg_val[0];
> > +            /* Code */
> > +            mask[1] = pFeat->cos_reg_val[1];
> > +        }
> > +        else
> > +        {
> > +            /* Data */
> > +            mask[0] = pFeat->cos_reg_val[old_cos * 2];
> > +            /* Code */
> > +            mask[1] = pFeat->cos_reg_val[old_cos * 2 + 1];
> > +        }
> > +
> > +        /* Set new mask */
> > +        if ( type == PSR_MASK_TYPE_L3_DATA )
> > +            mask[0] = m;
> > +        if ( type == PSR_MASK_TYPE_L3_CODE )
> > +            mask[1] = m;
> > +
> > +        return 2;
> > +    }
> > +
> > +    /* CAT */
> > +    if ( old_cos > cat_info.cos_max )
> > +        mask[0] =  pFeat->cos_reg_val[0];
> > +    else
> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
> 
> Across the entire function - wouldn't it be easier to range check
> old_cos once up front, forcing it to zero if the check fails? That
> would reduce code size to a better readable amount. (And just to
> avoid any misunderstanding - comments like this also may apply
> to other functions; I'm not going to repeat them there.)
> 
Good idea! Thanks for the suggestion. I will check other functions like this.

> Also note that there are undue double blanks in here.
> 
Do you mean two spaces? I cannot see it in my original patch and codes.

> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> > +        mask[0] = m;
> 
> This overwriting behavior also looks quite strange to me. What's
> the purpose? And if this really is meant to be that way, why is
> this not (leaving aside the other suggested adjustment)
> 
>     if ( type == PSR_MASK_TYPE_L3_CBM )
>         mask[0] = m;
>     else if ( old_cos > cat_info.cos_max )
>         mask[0] = pFeat->cos_reg_val[0];
>     else
>         mask[0] = pFeat->cos_reg_val[old_cos];
> 
> ?
> 
get_old_set_new() is used to do below two things:
1. get old_cos register value of all supported features and
2. set the new value for appointed feature.

So, if the appointed feature is L3 CAT, we should set input vallue for it here.

> > +static int l3_cat_get_feat_info(struct feat_list *pFeat, enum mask_type type,
> > +                                uint32_t *dat0, uint32_t *dat1,
> > +                                uint32_t *dat2)
> > +{
> > +    struct psr_cat_lvl_info cat_info;
> > +
> > +    if ( type != PSR_MASK_TYPE_L3_DATA &&
> > +         type != PSR_MASK_TYPE_L3_CODE &&
> > +         type != PSR_MASK_TYPE_L3_CBM )
> > +        return 0;
> > +
> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> > +
> > +    *dat0 = cat_info.cbm_len;
> > +    *dat1 = cat_info.cos_max;
> > +
> > +    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
> > +        *dat2 |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> > +    else
> > +        *dat2 = 0;
> > +
> > +    return 1;
> > +}
> 
> The latest here it becomes clear that you want to use switch()
> much more widely.
> 
Ok, will consider it carefully.

> >  static inline void psr_assoc_init(void)
> >  {
> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
> > +    struct psr_socket_alloc_info *info;
> > +    unsigned int cos_max;
> > +    unsigned int socket;
> >  
> > -    if ( cat_socket_info )
> > +    if ( socket_alloc_info )
> >      {
> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
> > +        socket = cpu_to_socket(smp_processor_id());
> 
> What's the point of moving the variable declaration? Quite the
> other way around, the new variable you add could - afaict - all
> live in this more narrow scope. And info could presumably become
> a pointer to const.
> 
I remember it reports warning or error when compiling old codes.

> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
> > -                        uint32_t *cos_max, uint32_t *flags)
> > +int psr_get_info(unsigned int socket, enum mask_type type,
> > +                 uint32_t *dat0, uint32_t *dat1,
> > +                 uint32_t *dat2)
> >  {
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> > +    struct feat_list *pTmp;
> >  
> >      if ( IS_ERR(info) )
> >          return PTR_ERR(info);
> >  
> > -    *cbm_len = info->cbm_len;
> > -    *cos_max = info->cos_max;
> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
> > +       return -ENODEV;
> >  
> > -    *flags = 0;
> > -    if ( cdp_is_enabled(socket) )
> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
> 
> Considering that this is one of the generic ops: What guarantees
> that 3 elements will suffice going forward? I.e. why is this not an
> array?
> 
Good point. I consider this too. Per all features we know, three parameters
are sufficient. To simplify codes, I write the function like this. I think
we can make change if future feature has more requirements.

If you like to get things done one time, I will make change.

> > -int psr_get_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t *cbm, enum cbm_type type)
> > +static int get_old_set_new(uint64_t *mask,
> > +                           struct psr_socket_alloc_info *info,
> > +                           unsigned int old_cos,
> > +                           enum mask_type type,
> > +                           uint64_t m)
> >  {
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > -    bool_t cdp_enabled = cdp_is_enabled(socket);
> > -
> > -    if ( IS_ERR(info) )
> > -        return PTR_ERR(info);
> > +    struct feat_list *pTmp;
> > +    int ret;
> >  
> > -    switch ( type )
> > -    {
> > -    case PSR_CBM_TYPE_L3:
> > -        if ( cdp_enabled )
> > -            return -EXDEV;
> > -        *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        break;
> > +    if ( !mask )
> > +        return -EINVAL;
> >  
> > -    case PSR_CBM_TYPE_L3_CODE:
> > -        if ( !cdp_enabled )
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        else
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].code;
> > -        break;
> > +    if ( !info || !info->pFeat )
> > +        return -ENODEV;
> 
> Earlier on you used (slightly oddly arranged) checks against NULL.
> Please be consistent (and I'd prefer if you used ! across the board).
> 
Got it, thanks!

> > -    case PSR_CBM_TYPE_L3_DATA:
> > -        if ( !cdp_enabled )
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> > -        else
> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
> > -        break;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        /* mask getting and setting order is same as feature list */
> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
> > +        if ( ret < 0 )
> > +            return ret;
> >  
> > -    default:
> > -        ASSERT_UNREACHABLE();
> > +        mask += ret;
> 
> With this - what ensures the array isn't going to be overrun?
> 
It seems you missed one line below. The while() loop traverse all features in
list. It will exit when the last member has been accessed.

+        pTmp = pTmp->pNext;

> And I'm also having some difficulty seeing how an array of outputs
> matches up with exactly one input (m). Can you explain this please?
> 
This while loop will traverse all features in list. It will get all features
old_cos value back. If the feature is the one to set new value, the input
new value (m) will replace the old_cos value. All the value are set into mask
array according the traversal order, meaning the same order of feature list
members, like L3 CAT as the first and L2 CAT as the second.

For example, old_cos of the domain is 1. User wants to set L3 CAT CBM to 0x1ff.
The original COS registers like below.

        -----------------------
        | COS 0 | COS 1 | ... |
        -----------------------
L3 CAT  | 0x7ff | 0x3ff | ... |
        -----------------------
L2 CAT  | 0xff  | 0x3f  | ... |
        -----------------------

Then, mask array should be:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask().

> > -static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
> > +int psr_get_val(struct domain *d, unsigned int socket,
> > +                uint64_t *val, enum mask_type type)
> >  {
> > -    unsigned int first_bit, zero_bit;
> > -
> > -    /* Set bits should only in the range of [0, cbm_len). */
> > -    if ( cbm & (~0ull << cbm_len) )
> > -        return 0;
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> > +    unsigned int cos = d->arch.psr_cos_ids[socket];
> > +    struct feat_list *pTmp;
> > +    int ret;
> >  
> > -    /* At least one bit need to be set. */
> > -    if ( cbm == 0 )
> > -        return 0;
> > +    if ( IS_ERR(info) )
> > +        return PTR_ERR(info);
> >  
> > -    first_bit = find_first_bit(&cbm, cbm_len);
> > -    zero_bit = find_next_zero_bit(&cbm, cbm_len, first_bit);
> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext )
> > +        return -ENODEV;
> >  
> > -    /* Set bits should be contiguous. */
> > -    if ( zero_bit < cbm_len &&
> > -         find_next_bit(&cbm, cbm_len, zero_bit) < cbm_len )
> > -        return 0;
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        ret = pTmp->ops.get_val(pTmp, cos, type, val);
> > +        if ( ret < 0 )
> > +            return -EINVAL;
> 
> Why don't you return "ret" here?
> 
This is an unnecessary check. Will remove it.

> > +        else if ( ret > 0 )
> 
> Pointless "else".
Will remove it.

> 
> > -static void do_write_l3_cbm(void *data)
> > +static void do_write_psr_ref(void *data)
> 
> What does "_ref" stand for?
> 
Sorry, a mistake.

> > -static int find_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> > -                    uint64_t cbm_code, uint64_t cbm_data, bool_t cdp_enabled)
> > +static int find_cos(uint64_t *mask, enum mask_type type,
> > +                    struct psr_socket_alloc_info *info)
> >  {
> >      unsigned int cos;
> > +    struct psr_ref *map = info->cos_ref;
> > +    struct feat_list *pTmp;
> > +    uint64_t *pMask = mask;
> > +    int ret;
> > +    bool_t found = 0;
> > +    unsigned int cos_max = 0;
> > +
> > +    /* cos_max is the max COS of the feature to be set. */
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> > +        if ( cos_max > 0 )
> > +            break;
> > +
> > +        pTmp = pTmp->pNext;
> > +    }
> >  
> > +    pTmp = info->pFeat->pNext;
> >      for ( cos = 0; cos <= cos_max; cos++ )
> >      {
> > -        if ( (map[cos].ref || cos == 0) &&
> > -             ((!cdp_enabled && map[cos].cbm == cbm_code) ||
> > -              (cdp_enabled && map[cos].code == cbm_code &&
> > -                              map[cos].data == cbm_data)) )
> > +        if ( 0 != cos && 0 == map[cos].ref )
> > +            continue;
> > +
> > +        while ( pTmp )
> > +        {
> > +            /*
> > +             * Compare mask according to feature list order.
> > +             * We must follow this order because mask is assembled
> > +             * as this order in get_old_set_new().
> > +             */
> > +            ret = pTmp->ops.compare_mask(pMask, pTmp, cos, &found);
> > +            if ( ret < 0 )
> > +                return ret;
> > +
> > +            /* If fail to match, go to next cos to compare. */
> > +            if ( !found )
> > +                break;
> > +
> > +            pMask += ret;
> > +            pTmp = pTmp->pNext;
> > +        }
> > +
> > +        /* Every feature can match, this cos is what we find. */
> > +        if ( found )
> >              return cos;
> > +
> > +        /* Not found, need find again from beginning. */
> > +        pTmp = info->pFeat->pNext;
> > +        pMask = mask;
> 
> Please do once (at the appropriate place inside the outer loop) what
> you have done prior to the loop and repeat here. As already alluded
> to - limiting the scope of variables also helps ensuring that things
> work as intended.
> 
Good suggestion, thanks!

> >      }
> >  
> > +    printk(XENLOG_INFO "%s: cannot find cos.\n", __func__);
> 
> Again a questionable log message with a questionable use of __func__.
> In any case log messages should normally not have a full stop at their
> end.
> 
Will remove this log.

> > -static int pick_avail_cos(struct psr_cat_cbm *map, unsigned int cos_max,
> > -                          unsigned int old_cos)
> > +static int check_exceed_range(uint64_t *mask, struct feat_list *pTmp,
> > +                              unsigned int cos)
> >  {
> > +    unsigned int ret = 0;
> 
> Pointless initializer.
> 
Will remove it.

> > +static int alloc_new_cos(struct psr_socket_alloc_info *info,
> > +                         uint64_t *mask, unsigned int old_cos,
> > +                         enum mask_type type)
> > +{
> > +    unsigned int cos_max = 0;
> > +    struct feat_list *pTmp;
> >      unsigned int cos;
> > +    struct psr_ref *map = info->cos_ref;
> > +
> > +    /*
> > +     * cos_max is the max COS of the feature to be set.
> > +     */
> > +    pTmp = info->pFeat->pNext;
> > +    while ( pTmp )
> > +    {
> > +        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
> > +        if ( cos_max > 0 )
> > +            break;
> > +
> > +        pTmp = pTmp->pNext;
> > +    }
> > +
> > +    if ( !cos_max )
> > +        return -ENOENT;
> >  
> >      /* If old cos is referred only by the domain, then use it. */
> >      if ( map[old_cos].ref == 1 && old_cos != 0 )
> > +    {
> > +        pTmp = info->pFeat->pNext;
> > +        if ( check_exceed_range(mask, pTmp, old_cos) )
> > +            goto find_new;
> > +
> >          return old_cos;
> > +    }
> >  
> > +find_new:
> 
> Labels indented by at least one blank please. In this case, though,
> it looks like you could easily do without any label and goto.
> 
Ok, I will consider not to use goto and label here. At least, will add one
blank. Thanks!

> > -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t cbm, enum cbm_type type)
> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint64_t val, enum mask_type type)
> >  {
> > -    unsigned int old_cos, cos_max;
> > +    unsigned int old_cos;
> >      int cos, ret;
> > -    uint64_t cbm_data, cbm_code;
> > -    bool_t cdp_enabled = cdp_is_enabled(socket);
> > -    struct psr_cat_cbm *map;
> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> > +    struct psr_ref *map;
> > +    uint64_t *mask;
> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> >  
> >      if ( IS_ERR(info) )
> >          return PTR_ERR(info);
> >  
> > -    if ( !psr_check_cbm(info->cbm_len, cbm) )
> > -        return -EINVAL;
> > -
> > -    if ( !cdp_enabled && (type == PSR_CBM_TYPE_L3_CODE ||
> > -                          type == PSR_CBM_TYPE_L3_DATA) )
> > -        return -ENXIO;
> > -
> > -    cos_max = info->cos_max;
> > +    map = info->cos_ref;
> >      old_cos = d->arch.psr_cos_ids[socket];
> > -    map = info->cos_to_cbm;
> > -
> > -    switch ( type )
> > -    {
> > -    case PSR_CBM_TYPE_L3:
> > -        cbm_code = cbm;
> > -        cbm_data = cbm;
> > -        break;
> > -
> > -    case PSR_CBM_TYPE_L3_CODE:
> > -        cbm_code = cbm;
> > -        cbm_data = map[old_cos].data;
> > -        break;
> > -
> > -    case PSR_CBM_TYPE_L3_DATA:
> > -        cbm_code = map[old_cos].code;
> > -        cbm_data = cbm;
> > -        break;
> >  
> > -    default:
> > -        ASSERT_UNREACHABLE();
> > -        return -EINVAL;
> > -    }
> > +    /* Considering CDP liking features */
> 
> DYM "like"? And what do you want to say with this comment?
> 
Yes, "like" should be. CDP needs occupy two members in mask array. So, I
make nr_feat multiply 2 as the array size.

> > +    mask = xzalloc_array(uint64_t, info->nr_feat * 2);
> 
> Perhaps it's related to the doubling of nr_feat here? If so - what
> precludes another future feature to require 3 or 4 slots per element?
> I think this needs abstracting out, now that you create an abstract
> framework.
> 
Good point. I will consider it.

> > +    if ( (ret = get_old_set_new(mask, info, old_cos, type, val)) != 0 )
> 
> What if the xzalloc_array() returned NULL?
> 
Sorry, missed the check.

> > +        return ret;
> >  
> > -    spin_lock(&info->cbm_lock);
> > -    cos = find_cos(map, cos_max, cbm_code, cbm_data, cdp_enabled);
> > +    spin_lock(&info->mask_lock);
> > +    cos = find_cos(mask, type, info);
> >      if ( cos >= 0 )
> >      {
> >          if ( cos == old_cos )
> >          {
> > -            spin_unlock(&info->cbm_lock);
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> >              return 0;
> >          }
> >      }
> >      else
> >      {
> > -        cos = pick_avail_cos(map, cos_max, old_cos);
> > +        cos = alloc_new_cos(info, mask, old_cos, type);
> >          if ( cos < 0 )
> >          {
> > -            spin_unlock(&info->cbm_lock);
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> >              return cos;
> >          }
> >  
> > -        /* We try to avoid writing MSR. */
> > -        if ( (cdp_enabled &&
> > -             (map[cos].code != cbm_code || map[cos].data != cbm_data)) ||
> > -             (!cdp_enabled && map[cos].cbm != cbm_code) )
> > +        /* Write all features mask MSRs corresponding to the COS */
> > +        ret = write_psr_ref(socket, cos, mask);
> > +        if ( ret )
> >          {
> > -            ret = write_l3_cbm(socket, cos, cbm_code, cbm_data, cdp_enabled);
> > -            if ( ret )
> > -            {
> > -                spin_unlock(&info->cbm_lock);
> > -                return ret;
> > -            }
> > -            map[cos].code = cbm_code;
> > -            map[cos].data = cbm_data;
> > +            spin_unlock(&info->mask_lock);
> > +            xfree(mask);
> > +            return ret;
> >          }
> >      }
> >  
> >      map[cos].ref++;
> >      map[old_cos].ref--;
> > -    spin_unlock(&info->cbm_lock);
> > +
> > +    spin_unlock(&info->mask_lock);
> >  
> >      d->arch.psr_cos_ids[socket] = cos;
> > +    xfree(mask);
> > +    mask = NULL;
> >  
> >      return 0;
> 
> I don't think setting mask to NULL right ahead of a return statement
> is very helpful.
> 
Yes, this sentence can be removed.

> > -static void cat_cpu_init(void)
> > +static void internal_cpu_init(void)
> 
> Is "internal_" really a useful prefix here?
> 
I wanted to express this is an internal function in psr.c to separate it from
psr_cpu_init().

> >  {
> >      unsigned int eax, ebx, ecx, edx;
> > -    struct psr_cat_socket_info *info;
> > +    struct psr_socket_alloc_info *info;
> >      unsigned int socket;
> >      unsigned int cpu = smp_processor_id();
> > -    uint64_t val;
> >      const struct cpuinfo_x86 *c = cpu_data + cpu;
> > +    struct feat_list *pTmp;
> >  
> >      if ( !cpu_has(c, X86_FEATURE_PQE) || c->cpuid_level < PSR_CPUID_LEVEL_CAT )
> >          return;
> >  
> >      socket = cpu_to_socket(cpu);
> > -    if ( test_bit(socket, cat_socket_enable) )
> > +    info = socket_alloc_info + socket;
> > +    if ( info->features )
> >          return;
> >  
> >      cpuid_count(PSR_CPUID_LEVEL_CAT, 0, &eax, &ebx, &ecx, &edx);
> >      if ( ebx & PSR_RESOURCE_TYPE_L3 )
> >      {
> > -        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> > -        info = cat_socket_info + socket;
> > -        info->cbm_len = (eax & 0x1f) + 1;
> > -        info->cos_max = min(opt_cos_max, edx & 0xffff);
> > -
> > -        info->cos_to_cbm = temp_cos_to_cbm;
> > -        temp_cos_to_cbm = NULL;
> > -        /* cos=0 is reserved as default cbm(all ones). */
> > -        info->cos_to_cbm[0].cbm = (1ull << info->cbm_len) - 1;
> > +        pTmp = pL3CAT;
> > +        if ( !pTmp )
> > +            return;
> 
> You check the allocation result in internal_cpu_prepare() - can you
> really get here with this being NULL? If you can't, drop the if() (or
> make it an ASSERT()).
> 
Ok, thanks!

> > +        pL3CAT = NULL;
> >  
> > -        spin_lock_init(&info->cbm_lock);
> > -
> > -        set_bit(socket, cat_socket_enable);
> > -
> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
> > -        {
> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
> > -
> > -            /* We only write mask1 since mask0 is always all ones by default. */
> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
> > -
> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
> > -
> > -            /* Cut half of cos_max when CDP is enabled. */
> > -            info->cos_max >>= 1;
> > -
> > -            set_bit(socket, cdp_socket_enable);
> > -        }
> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
> > -               socket, info->cos_max, info->cbm_len,
> > -               cdp_is_enabled(socket) ? "on" : "off");
> > +        /* Initialize this feature according to CPUID. */
> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> > +        pTmp->ops = l3_cat_ops;
> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
> 
> This function can return without having consumed pTmp, which means
> you'd leak the memory in that case.
> 
Yes, right, this is by design. pL3CAT/pL2CAT are allocated in
internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
pL2CAT will not be added into feature list. But when CPU on other socket up,
internal_cpu_prepare() will be called again and the pL2CAT will not be
allocated again (check if it is NULL). We can use the buffer has been allocated.
That means we maintain one redundant buffer to simplify the codes.

> > -static void __init psr_cat_free(void)
> > +static void __init psr_free(void)
> >  {
> > -    xfree(cat_socket_enable);
> > -    cat_socket_enable = NULL;
> > -    xfree(cat_socket_info);
> > -    cat_socket_info = NULL;
> > +    int i;
> 
> unsigned int
> 
Will change it. Thanks!

> > -static void __init init_psr_cat(void)
> > +static void __init init_psr(void)
> >  {
> > +    int i;
> 
> Again.
> 
Will change it. Thanks!

> > +
> >      if ( opt_cos_max < 1 )
> >      {
> >          printk(XENLOG_INFO "CAT: disabled, cos_max is too small\n");
> >          return;
> >      }
> >  
> > -    cat_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
> > -    cat_socket_info = xzalloc_array(struct psr_cat_socket_info, nr_sockets);
> > -    cdp_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
> > +    socket_alloc_info = xzalloc_array(struct psr_socket_alloc_info, nr_sockets);
> > +
> > +    if ( !socket_alloc_info )
> > +    {
> > +        printk(XENLOG_INFO "Fail to alloc socket_alloc_info!\n");
> > +        return;
> > +    }
> >  
> > -    if ( !cat_socket_enable || !cat_socket_info )
> > -        psr_cat_free();
> > +    for ( i = 0; i < nr_sockets; i++ )
> > +    {
> > +        socket_alloc_info[i].pFeat =  xzalloc(struct feat_list);
> 
> What's the point of this dummy list element?
> 
The pFeat is a redundant list head to facilitate list operations.
With this list head, the advantage is that we can insert/delete the first
element same as others without special operations.

> > +        if ( NULL == socket_alloc_info[i].pFeat )
> > +        {
> > +            printk(XENLOG_INFO "Fail to alloc pFeat!\n");
> 
> How would the reader of the log easily associate such a message
> with something having gone wrong in PSR initialization?
> 
Yes, seems not necessary. Will remove it.

> > +            return;
> > +        }
> > +        socket_alloc_info[i].pFeat->pNext = NULL;
> 
> This is redundant with the use of xzalloc().
> 
Will remove it, thanks!

> >  static void psr_cpu_init(void)
> >  {
> > -    if ( cat_socket_info )
> > -        cat_cpu_init();
> > +    unsigned int socket;
> > +    struct psr_socket_alloc_info *info;
> > +
> > +    if ( socket_alloc_info )
> > +    {
> > +        socket = cpu_to_socket(smp_processor_id());
> > +        info = socket_alloc_info + socket;
> > +        info->cos_ref = temp_cos_ref;
> > +        temp_cos_ref = NULL;
> > +        spin_lock_init(&info->mask_lock);
> > +
> > +        internal_cpu_init();
> 
> Why is everything ahead of this call not part of that function?
> 
The original purpose is to make internal_cpu_init() be simpler. After
considering it again, move these codes into the function may be better.
Thanks!

> > --- a/xen/include/asm-x86/psr.h
> > +++ b/xen/include/asm-x86/psr.h
> > @@ -46,10 +46,10 @@ struct psr_cmt {
> >      struct psr_cmt_l3 l3;
> >  };
> >  
> > -enum cbm_type {
> > -    PSR_CBM_TYPE_L3,
> > -    PSR_CBM_TYPE_L3_CODE,
> > -    PSR_CBM_TYPE_L3_DATA,
> > +enum mask_type {
> > +    PSR_MASK_TYPE_L3_CBM,
> > +    PSR_MASK_TYPE_L3_CODE,
> > +    PSR_MASK_TYPE_L3_DATA,
> >  };
> 
> The enum tag is now too generic a name - please make this
> psr_mask_type or something similar.
> 
Ok, thanks!

> Jan
Jan Beulich Sept. 7, 2016, 9:01 a.m. UTC | #3
>> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
>> > +                         struct psr_socket_alloc_info *info);
>> > +    /*
>> > +     * get_old_set_new is used in set value process to get all features'
>> > +     * COS registers values according to original cos id of the domain.
>> > +     * Then, assemble them into an mask array as feature list order.
>> 
>> This sentence in particular doesn't make any sense to me. What
>> follows below also looks like it is in need of improvement.
>> 
> Do you mean the comments are not accurate?

I simply wasn't able to tell, because of not being able to interpret
the sentence.

> How about below description?
>  
> get_old_set_new will traverse all features in list. It is used to do below two
> things:
> 1. get old_cos register value of all supported features and
> 2. set the new value for appointed feature.
> 
> All the values are set into mask array according the traversal order, meaning
> the same order of feature list members.

Sounds reasonable. I suppose the example that you gave right
next wasn't meant to go into the comment.

>> > +    /*
>> > +     * exceed_range is used to check if the input cos id exceeds the
>> > +     * feature's cos_max and if the input mask value is not the default one.
>> > +     * Even if the associated cos exceeds the cos_max, HW can work as default
>> > +     * value. That is the reason we need check is mask value is default one.
>> > +     * If both criteria are fulfilled, that means the input exceeds the
>> > +     * range.
>> > +     */
>> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
>> > +                                 unsigned int cos);
>> 
>> According to the comment this is kind of a predicate, which seems
>> unlikely to return an unsigned value. In fact without a word on the
>> return value I'd expect such to return bool. And I'd also expect the
>> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
>> for compare above I wonder whether come or all of the parameters
>> should be pointers to const (please go over the entire patch and do
>> constification wherever possible/sensible).
>> 
> Yes, you are right. I will change the function type to bool and add const
> for not changed input pointers.
> 
> This function is used to check if the input cos id exceeds the cos_max. If yes
> and the set value is not default value, we should return error. So, I think
> to change the function name to exceed_cos_max(). How do you think?

Okay, except that I continue to think you mean "exceeds".
"exceed_cos_max" to me is kind of a directive, not a predicate.

>> > +#define MAX_FEAT_INFO_SIZE 8
>> > +#define MAX_COS_REG_NUM  128
>> 
>> Are these numbers arbitrary, or based on something?
>> 
> MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
> consider the extension for future feature.

In that case please use that sizeof() in the expression here.

> MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
> for all PSR features so far.

"So far" makes me still wonder: Is this an architectural limit or one
resulting from current (hardware) implementations. In the former
case I don't think a comment is strictly needed, but in the latter
case the choice should be explained.

>> > +struct psr_socket_alloc_info {
>> 
>> I've yet to see whether the "alloc" in the name really makes sense.

And btw., having seen the entire patch I don't think this alloc_ infix
is warranted both here and in the variable name.

>> > +/* Common functions for supporting feature callback functions. */
>> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
>> > +{
>> > +    if ( NULL == pHead || NULL == pTmp )
>> > +        return;
>> > +
>> > +    while ( pHead->pNext )
>> > +        pHead = pHead->pNext;
>> > +
>> > +    pHead->pNext = pTmp;
>> > +}
>> 
>> Do you really need custom list management here?
>> 
> It seems xen list interfaces require the input list be a double linked list but
> my list is a single linked list. Furthermore, I only need simple add to tail
> function and free function. So I create custom list management functions.

Unless there's a strong need, I'd like you to go with what is there,
or introduce _generic_ singly linked list management.

>> > +static void free_feature(struct psr_socket_alloc_info *info)
>> > +{
>> > +    struct feat_list *pTmp;
>> > +    struct feat_list *pPrev;
>> > +
>> > +    if ( NULL == info )
>> > +        return;
>> > +
>> > +    if ( NULL == info->pFeat )
>> > +        return;
>> > +
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        pPrev = pTmp;
>> > +        pTmp = pTmp->pNext;
>> > +        clear_bit(pPrev->feature, &(info->features));
>> > +        xfree(pPrev);
>> > +        pPrev = NULL;
>> > +    }
>> > +}
>> 
>> Why does info->pFeat not get freed here?
>> 
> The info->pFeat is a redundant list head to facilitate list operations.
> It is only freed when socket info is freed.
> 
> With this list head, the advantage is that we can insert/delete the first
> element same as others without special operations.

But it results in less obvious code (and a basically pointless extra
allocation). One more reason to use generic list management
primitives.

>> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
>> > +                                unsigned int ecx, unsigned int edx,
>> > +                                struct feat_list *pFeat,
>> > +                                struct psr_socket_alloc_info *info)
>> > +{
>> > +    struct psr_cat_lvl_info l3_cat;
>> > +    unsigned int socket;
>> > +    uint64_t val;
>> > +
>> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
>> > +        return;
>> 
>> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
>> array is one with unsigned int as the base type, which doesn't get
>> reflected here.
>> 
> I will use BUILD_BUG_ON() for the error case.
> 
> As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
> than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
> sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
> should be good.

In fact the BUILD_BUG_ON() may not be needed if you follow the
suggestion above.

>> > +    }
>> > +
>> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
>> 
>> Aforementioned BUILD_BUG_ON() would probably bets be placed
>> right ahead of this memcpy().
>> 
> Can we call BUILD_BUG_ON() at the top of the function to avoid executing above
> process if the size is not right?

Well, the purpose of BUILD_BUG_ON() is what its name implies: It
fails the _build_ if violated, so execution can't possibly come here
(or in fact anywhere) in that case.

>> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
>> > +                               unsigned int cos, bool_t *found)
>> > +{
>> > +    struct psr_cat_lvl_info cat_info;
>> > +    uint64_t l3_def_cbm;
>> > +
>> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
>> 
>> Already here I think this memcpy()ing gets unwieldy. Can't you
>> simply make the structure field a union of all types of interest?
>> 
> Sorry that I am not very clear about your meaning to make a union. Do you mean
> make feat_info a union? If so, it will lose the universality to cover all
> features. Future feature may have different info.

Which is the purpose of a union - you'd simply add a new member
then.

> I think I can replace the memcpy() to directly assign value to cat_info.

No, this copying (done in _many_ places) really should go away.

>> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
>> 
>> I haven't figured yet at what times this function will get called, but
>> I suspect it's inappropriate for it to log messages.
>> 
>> > +                return -ENOENT;
>> > +            }
>> > +            *found = 1;
>> > +            return 2;
>> 
>> Who or what is 2?
>> 
> CDP occupies 2 members in mask[] array. So, it returns 2 to make codes outside
> the function to skip to next feature's mask value.

I guess that's going to be understandable once the return values
of the hooks get properly explained in their comments.

>> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
>> > +        *found = 1;
>> > +    else
>> > +        *found = 0;
>> 
>> Please use a simple assignment in cases like this, instead of if/else.
>> 
> Do you mean codes like below?
> 
> *found = 0;
> if ( mask[0] == pFeat->cos_reg_val[cos] )
>     *found = 1;

No.

    *found = (mask[0] == pFeat->cos_reg_val[cos]);

>> > +    /* CAT */
>> > +    if ( old_cos > cat_info.cos_max )
>> > +        mask[0] =  pFeat->cos_reg_val[0];
>> > +    else
>> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
>> 
>> Across the entire function - wouldn't it be easier to range check
>> old_cos once up front, forcing it to zero if the check fails? That
>> would reduce code size to a better readable amount. (And just to
>> avoid any misunderstanding - comments like this also may apply
>> to other functions; I'm not going to repeat them there.)
>> 
> Good idea! Thanks for the suggestion. I will check other functions like 
> this.
> 
>> Also note that there are undue double blanks in here.
>> 
> Do you mean two spaces? I cannot see it in my original patch and codes.

It's even still visible in the quoted text above.

>> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
>> > +        mask[0] = m;
>> 
>> This overwriting behavior also looks quite strange to me. What's
>> the purpose? And if this really is meant to be that way, why is
>> this not (leaving aside the other suggested adjustment)
>> 
>>     if ( type == PSR_MASK_TYPE_L3_CBM )
>>         mask[0] = m;
>>     else if ( old_cos > cat_info.cos_max )
>>         mask[0] = pFeat->cos_reg_val[0];
>>     else
>>         mask[0] = pFeat->cos_reg_val[old_cos];
>> 
>> ?
>> 
> get_old_set_new() is used to do below two things:
> 1. get old_cos register value of all supported features and
> 2. set the new value for appointed feature.
> 
> So, if the appointed feature is L3 CAT, we should set input vallue for it 
> here.

Okay, that answers the "what" aspect, but leaves open _why_ it
needs to be that way.

>> >  static inline void psr_assoc_init(void)
>> >  {
>> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
>> > +    struct psr_socket_alloc_info *info;
>> > +    unsigned int cos_max;
>> > +    unsigned int socket;
>> >  
>> > -    if ( cat_socket_info )
>> > +    if ( socket_alloc_info )
>> >      {
>> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
>> > +        socket = cpu_to_socket(smp_processor_id());
>> 
>> What's the point of moving the variable declaration? Quite the
>> other way around, the new variable you add could - afaict - all
>> live in this more narrow scope. And info could presumably become
>> a pointer to const.
>> 
> I remember it reports warning or error when compiling old codes.

I don't see why it would.

>> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
>> > -                        uint32_t *cos_max, uint32_t *flags)
>> > +int psr_get_info(unsigned int socket, enum mask_type type,
>> > +                 uint32_t *dat0, uint32_t *dat1,
>> > +                 uint32_t *dat2)
>> >  {
>> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
>> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
>> > +    struct feat_list *pTmp;
>> >  
>> >      if ( IS_ERR(info) )
>> >          return PTR_ERR(info);
>> >  
>> > -    *cbm_len = info->cbm_len;
>> > -    *cos_max = info->cos_max;
>> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
>> > +       return -ENODEV;
>> >  
>> > -    *flags = 0;
>> > -    if ( cdp_is_enabled(socket) )
>> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
>> 
>> Considering that this is one of the generic ops: What guarantees
>> that 3 elements will suffice going forward? I.e. why is this not an
>> array?
>> 
> Good point. I consider this too. Per all features we know, three parameters
> are sufficient. To simplify codes, I write the function like this. I think
> we can make change if future feature has more requirements.

If you introduce an abstraction layer, you should design it with
abstract requirements in mind, not with the requirements of
currently known implementations.

>> > -    case PSR_CBM_TYPE_L3_DATA:
>> > -        if ( !cdp_enabled )
>> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
>> > -        else
>> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
>> > -        break;
>> > +    pTmp = info->pFeat->pNext;
>> > +    while ( pTmp )
>> > +    {
>> > +        /* mask getting and setting order is same as feature list */
>> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
>> > +        if ( ret < 0 )
>> > +            return ret;
>> >  
>> > -    default:
>> > -        ASSERT_UNREACHABLE();
>> > +        mask += ret;
>> 
>> With this - what ensures the array isn't going to be overrun?
>> 
> It seems you missed one line below. The while() loop traverse all features in
> list. It will exit when the last member has been accessed.
> 
> +        pTmp = pTmp->pNext;

No, I didn't miss any line. I'm asking because there's nothing but
programmer discipline keeping array size and list length match up.
That's fragile.

>> And I'm also having some difficulty seeing how an array of outputs
>> matches up with exactly one input (m). Can you explain this please?
>> 
> This while loop will traverse all features in list. It will get all features
> old_cos value back. If the feature is the one to set new value, the input
> new value (m) will replace the old_cos value. All the value are set into mask
> array according the traversal order, meaning the same order of feature list
> members, like L3 CAT as the first and L2 CAT as the second.

This gets us back to the question of _why_ you need this "get all but
set only one" functionality. Plus, wasn't it that CDP requires two
values per feature?

>> > -static void cat_cpu_init(void)
>> > +static void internal_cpu_init(void)
>> 
>> Is "internal_" really a useful prefix here?
>> 
> I wanted to express this is an internal function in psr.c to separate it from
> psr_cpu_init().

The static keyword already says it's an internal one.

>> > +        pL3CAT = NULL;
>> >  
>> > -        spin_lock_init(&info->cbm_lock);
>> > -
>> > -        set_bit(socket, cat_socket_enable);
>> > -
>> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
>> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
>> > -        {
>> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
>> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
>> > -
>> > -            /* We only write mask1 since mask0 is always all ones by default. */
>> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
>> > -
>> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
>> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
>> > -
>> > -            /* Cut half of cos_max when CDP is enabled. */
>> > -            info->cos_max >>= 1;
>> > -
>> > -            set_bit(socket, cdp_socket_enable);
>> > -        }
>> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
>> > -               socket, info->cos_max, info->cbm_len,
>> > -               cdp_is_enabled(socket) ? "on" : "off");
>> > +        /* Initialize this feature according to CPUID. */
>> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
>> > +        pTmp->ops = l3_cat_ops;
>> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
>> 
>> This function can return without having consumed pTmp, which means
>> you'd leak the memory in that case.
>> 
> Yes, right, this is by design.

I hope not. You shouldn't design in memory leaks.

> pL3CAT/pL2CAT are allocated in
> internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
> pL2CAT will not be added into feature list. But when CPU on other socket up,
> internal_cpu_prepare() will be called again and the pL2CAT will not be
> allocated again (check if it is NULL). We can use the buffer has been allocated.
> That means we maintain one redundant buffer to simplify the codes.

I don't think that's the case - pL3CAT gets set to NULL further up. But
with you switching to generic list management, this bogus extra list
element will go away anyway, and so will the extra allocation.

Jan
Yi Sun Sept. 8, 2016, 7:28 a.m. UTC | #4
On 16-09-07 03:01:34, Jan Beulich wrote:
> >> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
> >> > +                         struct psr_socket_alloc_info *info);
> >> > +    /*
> >> > +     * get_old_set_new is used in set value process to get all features'
> >> > +     * COS registers values according to original cos id of the domain.
> >> > +     * Then, assemble them into an mask array as feature list order.
> >> 
> >> This sentence in particular doesn't make any sense to me. What
> >> follows below also looks like it is in need of improvement.
> >> 
> > Do you mean the comments are not accurate?
> 
> I simply wasn't able to tell, because of not being able to interpret
> the sentence.
> 
> > How about below description?
> >  
> > get_old_set_new will traverse all features in list. It is used to do below two
> > things:
> > 1. get old_cos register value of all supported features and
> > 2. set the new value for appointed feature.
> > 
> > All the values are set into mask array according the traversal order, meaning
> > the same order of feature list members.
> 
> Sounds reasonable. I suppose the example that you gave right
> next wasn't meant to go into the comment.
> 
Great, will replace the comments with the new sentences. The example will not
go in.

> >> > +    /*
> >> > +     * exceed_range is used to check if the input cos id exceeds the
> >> > +     * feature's cos_max and if the input mask value is not the default one.
> >> > +     * Even if the associated cos exceeds the cos_max, HW can work as default
> >> > +     * value. That is the reason we need check is mask value is default one.
> >> > +     * If both criteria are fulfilled, that means the input exceeds the
> >> > +     * range.
> >> > +     */
> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> >> > +                                 unsigned int cos);
> >> 
> >> According to the comment this is kind of a predicate, which seems
> >> unlikely to return an unsigned value. In fact without a word on the
> >> return value I'd expect such to return bool. And I'd also expect the
> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> >> for compare above I wonder whether come or all of the parameters
> >> should be pointers to const (please go over the entire patch and do
> >> constification wherever possible/sensible).
> >> 
> > Yes, you are right. I will change the function type to bool and add const
> > for not changed input pointers.
> > 
> > This function is used to check if the input cos id exceeds the cos_max. If yes
> > and the set value is not default value, we should return error. So, I think
> > to change the function name to exceed_cos_max(). How do you think?
> 
> Okay, except that I continue to think you mean "exceeds".
> "exceed_cos_max" to me is kind of a directive, not a predicate.
> 
How about "beyond"?

> >> > +#define MAX_FEAT_INFO_SIZE 8
> >> > +#define MAX_COS_REG_NUM  128
> >> 
> >> Are these numbers arbitrary, or based on something?
> >> 
> > MAX_FEAT_INFO_SIZE is got from the sizeof(struct psr_cat_lvl_info) and
> > consider the extension for future feature.
> 
> In that case please use that sizeof() in the expression here.
> 
Sure. Thanks!

> > MAX_COS_REG_NUM is got from spec that the max COS registers number is 128
> > for all PSR features so far.
> 
> "So far" makes me still wonder: Is this an architectural limit or one
> resulting from current (hardware) implementations. In the former
> case I don't think a comment is strictly needed, but in the latter
> case the choice should be explained.
> 
It is the latter case. I will add comment to explain it. Thanks!

> >> > +struct psr_socket_alloc_info {
> >> 
> >> I've yet to see whether the "alloc" in the name really makes sense.
> 
> And btw., having seen the entire patch I don't think this alloc_ infix
> is warranted both here and in the variable name.
> 
Ok, will consider to remove it in codes. Thanks!

> >> > +/* Common functions for supporting feature callback functions. */
> >> > +static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
> >> > +{
> >> > +    if ( NULL == pHead || NULL == pTmp )
> >> > +        return;
> >> > +
> >> > +    while ( pHead->pNext )
> >> > +        pHead = pHead->pNext;
> >> > +
> >> > +    pHead->pNext = pTmp;
> >> > +}
> >> 
> >> Do you really need custom list management here?
> >> 
> > It seems xen list interfaces require the input list be a double linked list but
> > my list is a single linked list. Furthermore, I only need simple add to tail
> > function and free function. So I create custom list management functions.
> 
> Unless there's a strong need, I'd like you to go with what is there,
> or introduce _generic_ singly linked list management.
> 
I will check below list management interfaces to see if I can reuse them.
Thanks!
xen/include/xen/list.h

> >> > +static void free_feature(struct psr_socket_alloc_info *info)
> >> > +{
> >> > +    struct feat_list *pTmp;
> >> > +    struct feat_list *pPrev;
> >> > +
> >> > +    if ( NULL == info )
> >> > +        return;
> >> > +
> >> > +    if ( NULL == info->pFeat )
> >> > +        return;
> >> > +
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        pPrev = pTmp;
> >> > +        pTmp = pTmp->pNext;
> >> > +        clear_bit(pPrev->feature, &(info->features));
> >> > +        xfree(pPrev);
> >> > +        pPrev = NULL;
> >> > +    }
> >> > +}
> >> 
> >> Why does info->pFeat not get freed here?
> >> 
> > The info->pFeat is a redundant list head to facilitate list operations.
> > It is only freed when socket info is freed.
> > 
> > With this list head, the advantage is that we can insert/delete the first
> > element same as others without special operations.
> 
> But it results in less obvious code (and a basically pointless extra
> allocation). One more reason to use generic list management
> primitives.
> 
I will check below list management interfaces to see if I can reuse them.
Thanks!
xen/include/xen/list.h

> >> > +static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
> >> > +                                unsigned int ecx, unsigned int edx,
> >> > +                                struct feat_list *pFeat,
> >> > +                                struct psr_socket_alloc_info *info)
> >> > +{
> >> > +    struct psr_cat_lvl_info l3_cat;
> >> > +    unsigned int socket;
> >> > +    uint64_t val;
> >> > +
> >> > +    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
> >> > +        return;
> >> 
> >> DYM BUILD_BUG_ON()? Also the MAX_FEAT_INFO_SIZE dimensioned
> >> array is one with unsigned int as the base type, which doesn't get
> >> reflected here.
> >> 
> > I will use BUILD_BUG_ON() for the error case.
> > 
> > As my explanation for MAX_FEAT_INFO_SIZE above, it reflects a size more
> > than struct psr_cat_lvl_info. So, if MAX_FEAT_INFO_SIZE is bigger than the
> > sizeof(struct psr_cat_lvl_info), the following operations on feat_info[]
> > should be good.
> 
> In fact the BUILD_BUG_ON() may not be needed if you follow the
> suggestion above.
> 
Ok, thanks!

> >> > +    }
> >> > +
> >> > +    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
> >> 
> >> Aforementioned BUILD_BUG_ON() would probably bets be placed
> >> right ahead of this memcpy().
> >> 
> > Can we call BUILD_BUG_ON() at the top of the function to avoid executing above
> > process if the size is not right?
> 
> Well, the purpose of BUILD_BUG_ON() is what its name implies: It
> fails the _build_ if violated, so execution can't possibly come here
> (or in fact anywhere) in that case.
> 
Thanks for the explanation!

> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> >> > +                               unsigned int cos, bool_t *found)
> >> > +{
> >> > +    struct psr_cat_lvl_info cat_info;
> >> > +    uint64_t l3_def_cbm;
> >> > +
> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> >> 
> >> Already here I think this memcpy()ing gets unwieldy. Can't you
> >> simply make the structure field a union of all types of interest?
> >> 
> > Sorry that I am not very clear about your meaning to make a union. Do you mean
> > make feat_info a union? If so, it will lose the universality to cover all
> > features. Future feature may have different info.
> 
> Which is the purpose of a union - you'd simply add a new member
> then.
>
I guess your idea likes below. Right?
union {
    struct l3_info {
        union {
            uint64_t cbm;
            struct {
                uint64_t code;
                uint64_t data;
            };
        };

    };

    struct l2_info {
        uint64_t cbm;
    };
};
 
My original design is to use this feat_info cover all features and eliminate
the feature's specific properties. If using above union, we still need to
know the current feature is which when handles feat_info. That loses the
abstraction.

If my thought is not right, please correct me. Thanks!

> > I think I can replace the memcpy() to directly assign value to cat_info.
> 
> No, this copying (done in _many_ places) really should go away.
> 
I want to replace memcpy() to below codes.
cat_info.cbm_len = feat_info[0];
cat_info.cos_max = feat_info[1];

> >> > +                printk(XENLOG_ERR "CDP exceed cos max.\n");
> >> 
> >> I haven't figured yet at what times this function will get called, but
> >> I suspect it's inappropriate for it to log messages.
> >> 
> >> > +                return -ENOENT;
> >> > +            }
> >> > +            *found = 1;
> >> > +            return 2;
> >> 
> >> Who or what is 2?
> >> 
> > CDP occupies 2 members in mask[] array. So, it returns 2 to make codes outside
> > the function to skip to next feature's mask value.
> 
> I guess that's going to be understandable once the return values
> of the hooks get properly explained in their comments.
> 
Yes, I will explain return value by adding comments.

> >> > +    if ( mask[0] == pFeat->cos_reg_val[cos] )
> >> > +        *found = 1;
> >> > +    else
> >> > +        *found = 0;
> >> 
> >> Please use a simple assignment in cases like this, instead of if/else.
> >> 
> > Do you mean codes like below?
> > 
> > *found = 0;
> > if ( mask[0] == pFeat->cos_reg_val[cos] )
> >     *found = 1;
> 
> No.
> 
>     *found = (mask[0] == pFeat->cos_reg_val[cos]);
> 
Thank you!

> >> > +    /* CAT */
> >> > +    if ( old_cos > cat_info.cos_max )
> >> > +        mask[0] =  pFeat->cos_reg_val[0];
> >> > +    else
> >> > +        mask[0] =  pFeat->cos_reg_val[old_cos];
> >> 
> >> Across the entire function - wouldn't it be easier to range check
> >> old_cos once up front, forcing it to zero if the check fails? That
> >> would reduce code size to a better readable amount. (And just to
> >> avoid any misunderstanding - comments like this also may apply
> >> to other functions; I'm not going to repeat them there.)
> >> 
> > Good idea! Thanks for the suggestion. I will check other functions like 
> > this.
> > 
> >> Also note that there are undue double blanks in here.
> >> 
> > Do you mean two spaces? I cannot see it in my original patch and codes.
> 
> It's even still visible in the quoted text above.
> 
Oh, really strange. Anyway, I need submit a new version. I will check this
carefully. Thanks!

> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> >> > +        mask[0] = m;
> >> 
> >> This overwriting behavior also looks quite strange to me. What's
> >> the purpose? And if this really is meant to be that way, why is
> >> this not (leaving aside the other suggested adjustment)
> >> 
> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
> >>         mask[0] = m;
> >>     else if ( old_cos > cat_info.cos_max )
> >>         mask[0] = pFeat->cos_reg_val[0];
> >>     else
> >>         mask[0] = pFeat->cos_reg_val[old_cos];
> >> 
> >> ?
> >> 
> > get_old_set_new() is used to do below two things:
> > 1. get old_cos register value of all supported features and
> > 2. set the new value for appointed feature.
> > 
> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
> > here.
> 
> Okay, that answers the "what" aspect, but leaves open _why_ it
> needs to be that way.
> 
A scenario here to help to understand _why_. 

Like the example for explaining get_old_set_new(), old_cos of the domain is 1.
Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
COS registers like below.

        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
        -------------------------------
L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
        -------------------------------

Then, mask array should be assembled in get_old_set_new() like below:
mask[0]: 0x1ff
mask[1]: 0x3f

Then, we can use this mask array to find if there is matching COS through
compare_mask(). We can find COS 2 is the matching one. 

If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 2)
same as the mask array, we can reuse this combination directly but not to
allocate a new COS ID. By this way, we can save the COS registers. You know,
there is limitation on COS registers number.

Of course, if we cannot find it in existing combinations, we will call
alloc_new_cos() to allocate a new COS to use.

> >> >  static inline void psr_assoc_init(void)
> >> >  {
> >> >      struct psr_assoc *psra = &this_cpu(psr_assoc);
> >> > +    struct psr_socket_alloc_info *info;
> >> > +    unsigned int cos_max;
> >> > +    unsigned int socket;
> >> >  
> >> > -    if ( cat_socket_info )
> >> > +    if ( socket_alloc_info )
> >> >      {
> >> > -        unsigned int socket = cpu_to_socket(smp_processor_id());
> >> > +        socket = cpu_to_socket(smp_processor_id());
> >> 
> >> What's the point of moving the variable declaration? Quite the
> >> other way around, the new variable you add could - afaict - all
> >> live in this more narrow scope. And info could presumably become
> >> a pointer to const.
> >> 
> > I remember it reports warning or error when compiling old codes.
> 
> I don't see why it would.
> 
Hmm, I will check this again. Thanks!

> >> > -int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
> >> > -                        uint32_t *cos_max, uint32_t *flags)
> >> > +int psr_get_info(unsigned int socket, enum mask_type type,
> >> > +                 uint32_t *dat0, uint32_t *dat1,
> >> > +                 uint32_t *dat2)
> >> >  {
> >> > -    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
> >> > +    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
> >> > +    struct feat_list *pTmp;
> >> >  
> >> >      if ( IS_ERR(info) )
> >> >          return PTR_ERR(info);
> >> >  
> >> > -    *cbm_len = info->cbm_len;
> >> > -    *cos_max = info->cos_max;
> >> > +    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
> >> > +       return -ENODEV;
> >> >  
> >> > -    *flags = 0;
> >> > -    if ( cdp_is_enabled(socket) )
> >> > -        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
> >> 
> >> Considering that this is one of the generic ops: What guarantees
> >> that 3 elements will suffice going forward? I.e. why is this not an
> >> array?
> >> 
> > Good point. I consider this too. Per all features we know, three parameters
> > are sufficient. To simplify codes, I write the function like this. I think
> > we can make change if future feature has more requirements.
> 
> If you introduce an abstraction layer, you should design it with
> abstract requirements in mind, not with the requirements of
> currently known implementations.
> 
Ok, will consider to make a general interface.

> >> > -    case PSR_CBM_TYPE_L3_DATA:
> >> > -        if ( !cdp_enabled )
> >> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
> >> > -        else
> >> > -            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
> >> > -        break;
> >> > +    pTmp = info->pFeat->pNext;
> >> > +    while ( pTmp )
> >> > +    {
> >> > +        /* mask getting and setting order is same as feature list */
> >> > +        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
> >> > +        if ( ret < 0 )
> >> > +            return ret;
> >> >  
> >> > -    default:
> >> > -        ASSERT_UNREACHABLE();
> >> > +        mask += ret;
> >> 
> >> With this - what ensures the array isn't going to be overrun?
> >> 
> > It seems you missed one line below. The while() loop traverse all features in
> > list. It will exit when the last member has been accessed.
> > 
> > +        pTmp = pTmp->pNext;
> 
> No, I didn't miss any line. I'm asking because there's nothing but
> programmer discipline keeping array size and list length match up.
> That's fragile.
> 
Sorry, I misunderstood your meaning. I will add a mechanism to check mask array
size. Thanks for pointing this out!

> >> And I'm also having some difficulty seeing how an array of outputs
> >> matches up with exactly one input (m). Can you explain this please?
> >> 
> > This while loop will traverse all features in list. It will get all features
> > old_cos value back. If the feature is the one to set new value, the input
> > new value (m) will replace the old_cos value. All the value are set into mask
> > array according the traversal order, meaning the same order of feature list
> > members, like L3 CAT as the first and L2 CAT as the second.
> 
> This gets us back to the question of _why_ you need this "get all but
> set only one" functionality. Plus, wasn't it that CDP requires two
> values per feature?
> 
Please check above comment about "_why_".

After operation in get_old_set_new(), CDP occupies two elements of mask array.

> >> > -static void cat_cpu_init(void)
> >> > +static void internal_cpu_init(void)
> >> 
> >> Is "internal_" really a useful prefix here?
> >> 
> > I wanted to express this is an internal function in psr.c to separate it from
> > psr_cpu_init().
> 
> The static keyword already says it's an internal one.
> 
Yes, I did not figure out a better function name so I used 'internal'. Will
consider another one if you think this is not appropriate.

> >> > +        pL3CAT = NULL;
> >> >  
> >> > -        spin_lock_init(&info->cbm_lock);
> >> > -
> >> > -        set_bit(socket, cat_socket_enable);
> >> > -
> >> > -        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
> >> > -             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
> >> > -        {
> >> > -            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
> >> > -            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
> >> > -
> >> > -            /* We only write mask1 since mask0 is always all ones by default. */
> >> > -            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
> >> > -
> >> > -            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
> >> > -            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
> >> > -
> >> > -            /* Cut half of cos_max when CDP is enabled. */
> >> > -            info->cos_max >>= 1;
> >> > -
> >> > -            set_bit(socket, cdp_socket_enable);
> >> > -        }
> >> > -        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
> >> > -               socket, info->cos_max, info->cbm_len,
> >> > -               cdp_is_enabled(socket) ? "on" : "off");
> >> > +        /* Initialize this feature according to CPUID. */
> >> > +        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
> >> > +        pTmp->ops = l3_cat_ops;
> >> > +        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
> >> 
> >> This function can return without having consumed pTmp, which means
> >> you'd leak the memory in that case.
> >> 
> > Yes, right, this is by design.
> 
> I hope not. You shouldn't design in memory leaks.
> 
Ok, I will free the redundant buffer in free_feature(). Thanks!

> > pL3CAT/pL2CAT are allocated in
> > internal_cpu_prepare(). If this CPU does not support L2 CAT for example, the
> > pL2CAT will not be added into feature list. But when CPU on other socket up,
> > internal_cpu_prepare() will be called again and the pL2CAT will not be
> > allocated again (check if it is NULL). We can use the buffer has been allocated.
> > That means we maintain one redundant buffer to simplify the codes.
> 
> I don't think that's the case - pL3CAT gets set to NULL further up. But
> with you switching to generic list management, this bogus extra list
> element will go away anyway, and so will the extra allocation.
> 
Yes, you are right. For above scenario I described, pL3CAT will be allocated
again because it has been added into feature list and set to NULL. Finally,
it will be freed in free_feature(). So, pL3CAT is not a memory leak.

I am trying to explain the memory leak case, pL2CAT. Because CPU on socket0
does not support L2 CAT, it is not added into feature list and not set to
NULL. But if CPU on socket1 up and it supports L2 CAT, the pL2CAT allocated
before can be used directly without allocating a new buffer. Of course, if
there is no such case, e.g. all CPUs only support L3 CAT, the pL2CAT is leaked.

Per your suggestion that not allowing any memory leak, I will free it in
free_feature().

> Jan
Jan Beulich Sept. 8, 2016, 9:54 a.m. UTC | #5
>>> On 08.09.16 at 09:28, <yi.y.sun@linux.intel.com> wrote:
> On 16-09-07 03:01:34, Jan Beulich wrote:
>> >> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
>> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
>> >> > +                                 unsigned int cos);
>> >> 
>> >> According to the comment this is kind of a predicate, which seems
>> >> unlikely to return an unsigned value. In fact without a word on the
>> >> return value I'd expect such to return bool. And I'd also expect the
>> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
>> >> for compare above I wonder whether come or all of the parameters
>> >> should be pointers to const (please go over the entire patch and do
>> >> constification wherever possible/sensible).
>> >> 
>> > Yes, you are right. I will change the function type to bool and add const
>> > for not changed input pointers.
>> > 
>> > This function is used to check if the input cos id exceeds the cos_max. If yes
>> > and the set value is not default value, we should return error. So, I think
>> > to change the function name to exceed_cos_max(). How do you think?
>> 
>> Okay, except that I continue to think you mean "exceeds".
>> "exceed_cos_max" to me is kind of a directive, not a predicate.
>> 
> How about "beyond"?

What's wrong with "exceeds"?

>> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
>> >> > +                               unsigned int cos, bool_t *found)
>> >> > +{
>> >> > +    struct psr_cat_lvl_info cat_info;
>> >> > +    uint64_t l3_def_cbm;
>> >> > +
>> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
>> >> 
>> >> Already here I think this memcpy()ing gets unwieldy. Can't you
>> >> simply make the structure field a union of all types of interest?
>> >> 
>> > Sorry that I am not very clear about your meaning to make a union. Do you mean
>> > make feat_info a union? If so, it will lose the universality to cover all
>> > features. Future feature may have different info.
>> 
>> Which is the purpose of a union - you'd simply add a new member
>> then.
>>
> I guess your idea likes below. Right?
> union {
>     struct l3_info {
>         union {
>             uint64_t cbm;
>             struct {
>                 uint64_t code;
>                 uint64_t data;
>             };
>         };
> 
>     };
> 
>     struct l2_info {
>         uint64_t cbm;
>     };
> };
>  
> My original design is to use this feat_info cover all features and eliminate
> the feature's specific properties. If using above union, we still need to
> know the current feature is which when handles feat_info. That loses the
> abstraction.
> 
> If my thought is not right, please correct me. Thanks!

I don't understand what abstraction you would lose with the above
layout. The memcpy()int you currently do is, I'm sorry to say that,
horrible.

>> > I think I can replace the memcpy() to directly assign value to cat_info.
>> 
>> No, this copying (done in _many_ places) really should go away.
>> 
> I want to replace memcpy() to below codes.
> cat_info.cbm_len = feat_info[0];
> cat_info.cos_max = feat_info[1];

And again do that in a dozen places? No, please don't.

>> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
>> >> > +        mask[0] = m;
>> >> 
>> >> This overwriting behavior also looks quite strange to me. What's
>> >> the purpose? And if this really is meant to be that way, why is
>> >> this not (leaving aside the other suggested adjustment)
>> >> 
>> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
>> >>         mask[0] = m;
>> >>     else if ( old_cos > cat_info.cos_max )
>> >>         mask[0] = pFeat->cos_reg_val[0];
>> >>     else
>> >>         mask[0] = pFeat->cos_reg_val[old_cos];
>> >> 
>> >> ?
>> >> 
>> > get_old_set_new() is used to do below two things:
>> > 1. get old_cos register value of all supported features and
>> > 2. set the new value for appointed feature.
>> > 
>> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
>> > here.
>> 
>> Okay, that answers the "what" aspect, but leaves open _why_ it
>> needs to be that way.
>> 
> A scenario here to help to understand _why_. 
> 
> Like the example for explaining get_old_set_new(), old_cos of the domain is 
> 1.
> Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
> COS registers like below.
> 
>         -------------------------------
>         | COS 0 | COS 1 | COS 2 | ... |
>         -------------------------------
> L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
>         -------------------------------
> L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
>         -------------------------------
> 
> Then, mask array should be assembled in get_old_set_new() like below:
> mask[0]: 0x1ff
> mask[1]: 0x3f
> 
> Then, we can use this mask array to find if there is matching COS through
> compare_mask(). We can find COS 2 is the matching one. 
> 
> If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 2)
> same as the mask array, we can reuse this combination directly but not to
> allocate a new COS ID. By this way, we can save the COS registers. You know,
> there is limitation on COS registers number.
> 
> Of course, if we cannot find it in existing combinations, we will call
> alloc_new_cos() to allocate a new COS to use.

I think I begin to see the purpose, but then again I still can't get
this explanation in line with there just being a single new value to
be set (m). Perhaps it would help if you split the lookup from the
setting or a new value.

Jan
Yi Sun Sept. 9, 2016, 8:09 a.m. UTC | #6
On 16-09-08 03:54:24, Jan Beulich wrote:
> >>> On 08.09.16 at 09:28, <yi.y.sun@linux.intel.com> wrote:
> > On 16-09-07 03:01:34, Jan Beulich wrote:
> >> >> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
> >> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> >> >> > +                                 unsigned int cos);
> >> >> 
> >> >> According to the comment this is kind of a predicate, which seems
> >> >> unlikely to return an unsigned value. In fact without a word on the
> >> >> return value I'd expect such to return bool. And I'd also expect the
> >> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> >> >> for compare above I wonder whether come or all of the parameters
> >> >> should be pointers to const (please go over the entire patch and do
> >> >> constification wherever possible/sensible).
> >> >> 
> >> > Yes, you are right. I will change the function type to bool and add const
> >> > for not changed input pointers.
> >> > 
> >> > This function is used to check if the input cos id exceeds the cos_max. If yes
> >> > and the set value is not default value, we should return error. So, I think
> >> > to change the function name to exceed_cos_max(). How do you think?
> >> 
> >> Okay, except that I continue to think you mean "exceeds".
> >> "exceed_cos_max" to me is kind of a directive, not a predicate.
> >> 
> > How about "beyond"?
> 
> What's wrong with "exceeds"?
> 
Sorry, I may misunderstand you. Maybe "check_exceed_cos_max" is a good name?

> >> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> >> >> > +                               unsigned int cos, bool_t *found)
> >> >> > +{
> >> >> > +    struct psr_cat_lvl_info cat_info;
> >> >> > +    uint64_t l3_def_cbm;
> >> >> > +
> >> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> >> >> 
> >> >> Already here I think this memcpy()ing gets unwieldy. Can't you
> >> >> simply make the structure field a union of all types of interest?
> >> >> 
> >> > Sorry that I am not very clear about your meaning to make a union. Do you mean
> >> > make feat_info a union? If so, it will lose the universality to cover all
> >> > features. Future feature may have different info.
> >> 
> >> Which is the purpose of a union - you'd simply add a new member
> >> then.
> >>
> > I guess your idea likes below. Right?
> > union {
> >     struct l3_info {
> >         union {
> >             uint64_t cbm;
> >             struct {
> >                 uint64_t code;
> >                 uint64_t data;
> >             };
> >         };
> > 
> >     };
> > 
> >     struct l2_info {
> >         uint64_t cbm;
> >     };
> > };
> >  
> > My original design is to use this feat_info cover all features and eliminate
> > the feature's specific properties. If using above union, we still need to
> > know the current feature is which when handles feat_info. That loses the
> > abstraction.
> > 
> > If my thought is not right, please correct me. Thanks!
> 
> I don't understand what abstraction you would lose with the above
> layout. The memcpy()int you currently do is, I'm sorry to say that,
> horrible.
> 
Sorry to make you feel bad. I will avoid to use memcpy() in the codes.

I think I have got your mean. I will construct a union for feat_info in next
version. Different features can use different members in union. Then, I can
directly assign the struct value without memcpy. Thanks for the guidance!

> >> > I think I can replace the memcpy() to directly assign value to cat_info.
> >> 
> >> No, this copying (done in _many_ places) really should go away.
> >> 
> > I want to replace memcpy() to below codes.
> > cat_info.cbm_len = feat_info[0];
> > cat_info.cos_max = feat_info[1];
> 
> And again do that in a dozen places? No, please don't.
> 
Sure.

> >> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> >> >> > +        mask[0] = m;
> >> >> 
> >> >> This overwriting behavior also looks quite strange to me. What's
> >> >> the purpose? And if this really is meant to be that way, why is
> >> >> this not (leaving aside the other suggested adjustment)
> >> >> 
> >> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
> >> >>         mask[0] = m;
> >> >>     else if ( old_cos > cat_info.cos_max )
> >> >>         mask[0] = pFeat->cos_reg_val[0];
> >> >>     else
> >> >>         mask[0] = pFeat->cos_reg_val[old_cos];
> >> >> 
> >> >> ?
> >> >> 
> >> > get_old_set_new() is used to do below two things:
> >> > 1. get old_cos register value of all supported features and
> >> > 2. set the new value for appointed feature.
> >> > 
> >> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
> >> > here.
> >> 
> >> Okay, that answers the "what" aspect, but leaves open _why_ it
> >> needs to be that way.
> >> 
> > A scenario here to help to understand _why_. 
> > 
> > Like the example for explaining get_old_set_new(), old_cos of the domain is 
> > 1.
> > Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
> > COS registers like below.
> > 
> >         -------------------------------
> >         | COS 0 | COS 1 | COS 2 | ... |
> >         -------------------------------
> > L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
> >         -------------------------------
> > L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
> >         -------------------------------
> > 
> > Then, mask array should be assembled in get_old_set_new() like below:
> > mask[0]: 0x1ff
> > mask[1]: 0x3f
> > 
> > Then, we can use this mask array to find if there is matching COS through
> > compare_mask(). We can find COS 2 is the matching one. 
> > 
> > If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 2)
> > same as the mask array, we can reuse this combination directly but not to
> > allocate a new COS ID. By this way, we can save the COS registers. You know,
> > there is limitation on COS registers number.
> > 
> > Of course, if we cannot find it in existing combinations, we will call
> > alloc_new_cos() to allocate a new COS to use.
> 
> I think I begin to see the purpose, but then again I still can't get
> this explanation in line with there just being a single new value to
> be set (m). Perhaps it would help if you split the lookup from the
> setting or a new value.
> 
> Jan

Sure, let me introduce it from beginning.

First time, user wants to set L3 CAT of Dom1 to 0x1ff for example. The old_cos
of Dom1 is 0. L3 CAT is the first element of feature list. The COS registers
values are below at this time.

        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | ...   | ...   | ... |
        -------------------------------
L2 CAT  | 0xff  | ...   | ...   | ... |
        -------------------------------

The mask array assembled in get_old_set_new() is:
mask[0]: 0x1ff
mask[1]: 0xff

It cannot find a matching COS so allocate COS 1 to store the value set.
The COS registers values are changed to below now.

        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
        -------------------------------
L2 CAT  | 0xff  | 0xff  | ...   | ... |
        -------------------------------

Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2
is 0 too.

The mask array assembled in get_old_set_new() is:
mask[0]: 0x1ff
mask[1]: 0xff

So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2.
Jan Beulich Sept. 9, 2016, 8:32 a.m. UTC | #7
>>> On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
> On 16-09-08 03:54:24, Jan Beulich wrote:
>> >>> On 08.09.16 at 09:28, <yi.y.sun@linux.intel.com> wrote:
>> > On 16-09-07 03:01:34, Jan Beulich wrote:
>> >> >> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
>> >> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
>> >> >> > +                                 unsigned int cos);
>> >> >> 
>> >> >> According to the comment this is kind of a predicate, which seems
>> >> >> unlikely to return an unsigned value. In fact without a word on the
>> >> >> return value I'd expect such to return bool. And I'd also expect the
>> >> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
>> >> >> for compare above I wonder whether come or all of the parameters
>> >> >> should be pointers to const (please go over the entire patch and do
>> >> >> constification wherever possible/sensible).
>> >> >> 
>> >> > Yes, you are right. I will change the function type to bool and add const
>> >> > for not changed input pointers.
>> >> > 
>> >> > This function is used to check if the input cos id exceeds the cos_max. If yes
>> >> > and the set value is not default value, we should return error. So, I think
>> >> > to change the function name to exceed_cos_max(). How do you think?
>> >> 
>> >> Okay, except that I continue to think you mean "exceeds".
>> >> "exceed_cos_max" to me is kind of a directive, not a predicate.
>> >> 
>> > How about "beyond"?
>> 
>> What's wrong with "exceeds"?
>> 
> Sorry, I may misunderstand you. Maybe "check_exceed_cos_max" is a good name?

According to my knowledge of English this would still need to be
"check_exceeds_cos_max", and then the check ends up being quite
redundant.

>> >> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
>> >> >> > +                               unsigned int cos, bool_t *found)
>> >> >> > +{
>> >> >> > +    struct psr_cat_lvl_info cat_info;
>> >> >> > +    uint64_t l3_def_cbm;
>> >> >> > +
>> >> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
>> >> >> 
>> >> >> Already here I think this memcpy()ing gets unwieldy. Can't you
>> >> >> simply make the structure field a union of all types of interest?
>> >> >> 
>> >> > Sorry that I am not very clear about your meaning to make a union. Do you mean
>> >> > make feat_info a union? If so, it will lose the universality to cover all
>> >> > features. Future feature may have different info.
>> >> 
>> >> Which is the purpose of a union - you'd simply add a new member
>> >> then.
>> >>
>> > I guess your idea likes below. Right?
>> > union {
>> >     struct l3_info {
>> >         union {
>> >             uint64_t cbm;
>> >             struct {
>> >                 uint64_t code;
>> >                 uint64_t data;
>> >             };
>> >         };
>> > 
>> >     };
>> > 
>> >     struct l2_info {
>> >         uint64_t cbm;
>> >     };
>> > };
>> >  
>> > My original design is to use this feat_info cover all features and eliminate
>> > the feature's specific properties. If using above union, we still need to
>> > know the current feature is which when handles feat_info. That loses the
>> > abstraction.
>> > 
>> > If my thought is not right, please correct me. Thanks!
>> 
>> I don't understand what abstraction you would lose with the above
>> layout. The memcpy()int you currently do is, I'm sorry to say that,
>> horrible.
>> 
> Sorry to make you feel bad. I will avoid to use memcpy() in the codes.
> 
> I think I have got your mean. I will construct a union for feat_info in next
> version. Different features can use different members in union. Then, I can
> directly assign the struct value without memcpy. Thanks for the guidance!

You shouldn't need to do such assignments in the vast majority of
places you do the memcpy()ing in right now. (And don't forget,
structure assignment in the end is only a type safe variant of
memcpy().)

>> >> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
>> >> >> > +        mask[0] = m;
>> >> >> 
>> >> >> This overwriting behavior also looks quite strange to me. What's
>> >> >> the purpose? And if this really is meant to be that way, why is
>> >> >> this not (leaving aside the other suggested adjustment)
>> >> >> 
>> >> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
>> >> >>         mask[0] = m;
>> >> >>     else if ( old_cos > cat_info.cos_max )
>> >> >>         mask[0] = pFeat->cos_reg_val[0];
>> >> >>     else
>> >> >>         mask[0] = pFeat->cos_reg_val[old_cos];
>> >> >> 
>> >> >> ?
>> >> >> 
>> >> > get_old_set_new() is used to do below two things:
>> >> > 1. get old_cos register value of all supported features and
>> >> > 2. set the new value for appointed feature.
>> >> > 
>> >> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
>> >> > here.
>> >> 
>> >> Okay, that answers the "what" aspect, but leaves open _why_ it
>> >> needs to be that way.
>> >> 
>> > A scenario here to help to understand _why_. 
>> > 
>> > Like the example for explaining get_old_set_new(), old_cos of the domain is 
> 
>> > 1.
>> > Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
>> > COS registers like below.
>> > 
>> >         -------------------------------
>> >         | COS 0 | COS 1 | COS 2 | ... |
>> >         -------------------------------
>> > L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
>> >         -------------------------------
>> > L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
>> >         -------------------------------
>> > 
>> > Then, mask array should be assembled in get_old_set_new() like below:
>> > mask[0]: 0x1ff
>> > mask[1]: 0x3f
>> > 
>> > Then, we can use this mask array to find if there is matching COS through
>> > compare_mask(). We can find COS 2 is the matching one. 
>> > 
>> > If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 
> 2)
>> > same as the mask array, we can reuse this combination directly but not to
>> > allocate a new COS ID. By this way, we can save the COS registers. You 
> know,
>> > there is limitation on COS registers number.
>> > 
>> > Of course, if we cannot find it in existing combinations, we will call
>> > alloc_new_cos() to allocate a new COS to use.
>> 
>> I think I begin to see the purpose, but then again I still can't get
>> this explanation in line with there just being a single new value to
>> be set (m). Perhaps it would help if you split the lookup from the
>> setting or a new value.
>> 
>> Jan
> 
> Sure, let me introduce it from beginning.
> 
> First time, user wants to set L3 CAT of Dom1 to 0x1ff for example. The 
> old_cos
> of Dom1 is 0. L3 CAT is the first element of feature list. The COS registers
> values are below at this time.
> 
>         -------------------------------
>         | COS 0 | COS 1 | COS 2 | ... |
>         -------------------------------
> L3 CAT  | 0x7ff | ...   | ...   | ... |
>         -------------------------------
> L2 CAT  | 0xff  | ...   | ...   | ... |
>         -------------------------------
> 
> The mask array assembled in get_old_set_new() is:
> mask[0]: 0x1ff
> mask[1]: 0xff
> 
> It cannot find a matching COS so allocate COS 1 to store the value set.
> The COS registers values are changed to below now.
> 
>         -------------------------------
>         | COS 0 | COS 1 | COS 2 | ... |
>         -------------------------------
> L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
>         -------------------------------
> L2 CAT  | 0xff  | 0xff  | ...   | ... |
>         -------------------------------
> 
> Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2
> is 0 too.
> 
> The mask array assembled in get_old_set_new() is:
> mask[0]: 0x1ff
> mask[1]: 0xff
> 
> So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2.

So the lookup is then to find an entry with all current values except
for the one intended to be changed. If that's a correct understanding
of mine, then I think the primitives should be more primitive, making
the overall operation
1) get all current values into an array,
2) update the one array slot that's meant to get changed,
3) look for matching entries,
4) allocate new entry if none found.
And with that I'm then not even sure how many of these steps
actually need hooks, as most if not all of this looks pretty
independent of the aspects of the specific CAT variant. Hooks
should be introduced solely for cases where behavior necessarily
differs between variants.

Jan
Ian Jackson Sept. 9, 2016, 10:14 a.m. UTC | #8
Jan Beulich writes ("Re: [PATCH 1/3] x86: refactor psr implementation in hypervisor."):
> On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
> > Sorry, I may misunderstand you. Maybe "check_exceed_cos_max" is a good name?
> 
> According to my knowledge of English this would still need to be
> "check_exceeds_cos_max", and then the check ends up being quite
> redundant.

In the hope of helping to end this subthread: I'm a native speaker of
English and Jan is right.

Ian.
Yi Sun Sept. 12, 2016, 3:26 a.m. UTC | #9
On 16-09-09 02:32:25, Jan Beulich wrote:
> >>> On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
> > On 16-09-08 03:54:24, Jan Beulich wrote:
> >> >>> On 08.09.16 at 09:28, <yi.y.sun@linux.intel.com> wrote:
> >> > On 16-09-07 03:01:34, Jan Beulich wrote:
> >> >> >> >>> On 25.08.16 at 07:22, <yi.y.sun@linux.intel.com> wrote:
> >> >> >> > +    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
> >> >> >> > +                                 unsigned int cos);
> >> >> >> 
> >> >> >> According to the comment this is kind of a predicate, which seems
> >> >> >> unlikely to return an unsigned value. In fact without a word on the
> >> >> >> return value I'd expect such to return bool. And I'd also expect the
> >> >> >> name to reflect the purpose, i.e. "exceeds_name()". Plus just like
> >> >> >> for compare above I wonder whether come or all of the parameters
> >> >> >> should be pointers to const (please go over the entire patch and do
> >> >> >> constification wherever possible/sensible).
> >> >> >> 
> >> >> > Yes, you are right. I will change the function type to bool and add const
> >> >> > for not changed input pointers.
> >> >> > 
> >> >> > This function is used to check if the input cos id exceeds the cos_max. If yes
> >> >> > and the set value is not default value, we should return error. So, I think
> >> >> > to change the function name to exceed_cos_max(). How do you think?
> >> >> 
> >> >> Okay, except that I continue to think you mean "exceeds".
> >> >> "exceed_cos_max" to me is kind of a directive, not a predicate.
> >> >> 
> >> > How about "beyond"?
> >> 
> >> What's wrong with "exceeds"?
> >> 
> > Sorry, I may misunderstand you. Maybe "check_exceed_cos_max" is a good name?
> 
> According to my knowledge of English this would still need to be
> "check_exceeds_cos_max", and then the check ends up being quite
> redundant.
> 
Ok, got it. Thanks!

> >> >> >> > +static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
> >> >> >> > +                               unsigned int cos, bool_t *found)
> >> >> >> > +{
> >> >> >> > +    struct psr_cat_lvl_info cat_info;
> >> >> >> > +    uint64_t l3_def_cbm;
> >> >> >> > +
> >> >> >> > +    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
> >> >> >> 
> >> >> >> Already here I think this memcpy()ing gets unwieldy. Can't you
> >> >> >> simply make the structure field a union of all types of interest?
> >> >> >> 
> >> >> > Sorry that I am not very clear about your meaning to make a union. Do you mean
> >> >> > make feat_info a union? If so, it will lose the universality to cover all
> >> >> > features. Future feature may have different info.
> >> >> 
> >> >> Which is the purpose of a union - you'd simply add a new member
> >> >> then.
> >> >>
> >> > I guess your idea likes below. Right?
> >> > union {
> >> >     struct l3_info {
> >> >         union {
> >> >             uint64_t cbm;
> >> >             struct {
> >> >                 uint64_t code;
> >> >                 uint64_t data;
> >> >             };
> >> >         };
> >> > 
> >> >     };
> >> > 
> >> >     struct l2_info {
> >> >         uint64_t cbm;
> >> >     };
> >> > };
> >> >  
> >> > My original design is to use this feat_info cover all features and eliminate
> >> > the feature's specific properties. If using above union, we still need to
> >> > know the current feature is which when handles feat_info. That loses the
> >> > abstraction.
> >> > 
> >> > If my thought is not right, please correct me. Thanks!
> >> 
> >> I don't understand what abstraction you would lose with the above
> >> layout. The memcpy()int you currently do is, I'm sorry to say that,
> >> horrible.
> >> 
> > Sorry to make you feel bad. I will avoid to use memcpy() in the codes.
> > 
> > I think I have got your mean. I will construct a union for feat_info in next
> > version. Different features can use different members in union. Then, I can
> > directly assign the struct value without memcpy. Thanks for the guidance!
> 
> You shouldn't need to do such assignments in the vast majority of
> places you do the memcpy()ing in right now. (And don't forget,
> structure assignment in the end is only a type safe variant of
> memcpy().)
> 
Got it, thanks!

> >> >> >> > +    if ( type == PSR_MASK_TYPE_L3_CBM )
> >> >> >> > +        mask[0] = m;
> >> >> >> 
> >> >> >> This overwriting behavior also looks quite strange to me. What's
> >> >> >> the purpose? And if this really is meant to be that way, why is
> >> >> >> this not (leaving aside the other suggested adjustment)
> >> >> >> 
> >> >> >>     if ( type == PSR_MASK_TYPE_L3_CBM )
> >> >> >>         mask[0] = m;
> >> >> >>     else if ( old_cos > cat_info.cos_max )
> >> >> >>         mask[0] = pFeat->cos_reg_val[0];
> >> >> >>     else
> >> >> >>         mask[0] = pFeat->cos_reg_val[old_cos];
> >> >> >> 
> >> >> >> ?
> >> >> >> 
> >> >> > get_old_set_new() is used to do below two things:
> >> >> > 1. get old_cos register value of all supported features and
> >> >> > 2. set the new value for appointed feature.
> >> >> > 
> >> >> > So, if the appointed feature is L3 CAT, we should set input vallue for it 
> >> >> > here.
> >> >> 
> >> >> Okay, that answers the "what" aspect, but leaves open _why_ it
> >> >> needs to be that way.
> >> >> 
> >> > A scenario here to help to understand _why_. 
> >> > 
> >> > Like the example for explaining get_old_set_new(), old_cos of the domain is 
> > 
> >> > 1.
> >> > Then, User wants to set L3 CAT CBM to 0x1ff and L2 CAT 0x3f. The original
> >> > COS registers like below.
> >> > 
> >> >         -------------------------------
> >> >         | COS 0 | COS 1 | COS 2 | ... |
> >> >         -------------------------------
> >> > L3 CAT  | 0x7ff | 0x3ff | 0x1ff | ... |
> >> >         -------------------------------
> >> > L2 CAT  | 0xff  | 0x3f  | 0x3f  | ... |
> >> >         -------------------------------
> >> > 
> >> > Then, mask array should be assembled in get_old_set_new() like below:
> >> > mask[0]: 0x1ff
> >> > mask[1]: 0x3f
> >> > 
> >> > Then, we can use this mask array to find if there is matching COS through
> >> > compare_mask(). We can find COS 2 is the matching one. 
> >> > 
> >> > If there is already a COS registers combination (e.g. L3 COS 2 and L2 COS 
> > 2)
> >> > same as the mask array, we can reuse this combination directly but not to
> >> > allocate a new COS ID. By this way, we can save the COS registers. You 
> > know,
> >> > there is limitation on COS registers number.
> >> > 
> >> > Of course, if we cannot find it in existing combinations, we will call
> >> > alloc_new_cos() to allocate a new COS to use.
> >> 
> >> I think I begin to see the purpose, but then again I still can't get
> >> this explanation in line with there just being a single new value to
> >> be set (m). Perhaps it would help if you split the lookup from the
> >> setting or a new value.
> >> 
> >> Jan
> > 
> > Sure, let me introduce it from beginning.
> > 
> > First time, user wants to set L3 CAT of Dom1 to 0x1ff for example. The 
> > old_cos
> > of Dom1 is 0. L3 CAT is the first element of feature list. The COS registers
> > values are below at this time.
> > 
> >         -------------------------------
> >         | COS 0 | COS 1 | COS 2 | ... |
> >         -------------------------------
> > L3 CAT  | 0x7ff | ...   | ...   | ... |
> >         -------------------------------
> > L2 CAT  | 0xff  | ...   | ...   | ... |
> >         -------------------------------
> > 
> > The mask array assembled in get_old_set_new() is:
> > mask[0]: 0x1ff
> > mask[1]: 0xff
> > 
> > It cannot find a matching COS so allocate COS 1 to store the value set.
> > The COS registers values are changed to below now.
> > 
> >         -------------------------------
> >         | COS 0 | COS 1 | COS 2 | ... |
> >         -------------------------------
> > L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
> >         -------------------------------
> > L2 CAT  | 0xff  | 0xff  | ...   | ... |
> >         -------------------------------
> > 
> > Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2
> > is 0 too.
> > 
> > The mask array assembled in get_old_set_new() is:
> > mask[0]: 0x1ff
> > mask[1]: 0xff
> > 
> > So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2.
> 
> So the lookup is then to find an entry with all current values except
> for the one intended to be changed. If that's a correct understanding
> of mine, then I think the primitives should be more primitive, making
> the overall operation
> 1) get all current values into an array,
> 2) update the one array slot that's meant to get changed,
> 3) look for matching entries,
> 4) allocate new entry if none found.
> And with that I'm then not even sure how many of these steps
> actually need hooks, as most if not all of this looks pretty
> independent of the aspects of the specific CAT variant. Hooks
> should be introduced solely for cases where behavior necessarily
> differs between variants.
> 
> Jan
By analyzing the behaviors and requirements of different CAT features, I
divide COS find/allocate process to three steps.
1) get_old_set_new(): get all current values into an array and the value of
the changed feature is replaced by the input value(m). Here, you get an array
which contains all features old value and the new value for changed feature.
2) find_cos(): find if there is a matching COS combination which can match
the input array.
3) alloc_new_cos(): if find_cos() cannot find a matching one, allocate a new
COS id or reuse the one which is only occupied by current domain.

So, our ideas are almost same. Only one exception to split get_old_set_new()
function into two parts. That increases one loop to traverse all features and
most features "set new value" callback functions will return directly because
the type cannot match. And, I think to put value array assembling work into one
function is reasonable. What do you think? Thanks!
Yi Sun Sept. 12, 2016, 3:26 a.m. UTC | #10
On 16-09-09 11:14:50, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH 1/3] x86: refactor psr implementation in hypervisor."):
> > On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
> > > Sorry, I may misunderstand you. Maybe "check_exceed_cos_max" is a good name?
> > 
> > According to my knowledge of English this would still need to be
> > "check_exceeds_cos_max", and then the check ends up being quite
> > redundant.
> 
> In the hope of helping to end this subthread: I'm a native speaker of
> English and Jan is right.
> 
> Ian.
Thank you!
Jan Beulich Sept. 12, 2016, 7:30 a.m. UTC | #11
>>> On 12.09.16 at 05:26, <yi.y.sun@linux.intel.com> wrote:
> On 16-09-09 02:32:25, Jan Beulich wrote:
>> >>> On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
>> > First time, user wants to set L3 CAT of Dom1 to 0x1ff for example. The 
>> > old_cos
>> > of Dom1 is 0. L3 CAT is the first element of feature list. The COS registers
>> > values are below at this time.
>> > 
>> >         -------------------------------
>> >         | COS 0 | COS 1 | COS 2 | ... |
>> >         -------------------------------
>> > L3 CAT  | 0x7ff | ...   | ...   | ... |
>> >         -------------------------------
>> > L2 CAT  | 0xff  | ...   | ...   | ... |
>> >         -------------------------------
>> > 
>> > The mask array assembled in get_old_set_new() is:
>> > mask[0]: 0x1ff
>> > mask[1]: 0xff
>> > 
>> > It cannot find a matching COS so allocate COS 1 to store the value set.
>> > The COS registers values are changed to below now.
>> > 
>> >         -------------------------------
>> >         | COS 0 | COS 1 | COS 2 | ... |
>> >         -------------------------------
>> > L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
>> >         -------------------------------
>> > L2 CAT  | 0xff  | 0xff  | ...   | ... |
>> >         -------------------------------
>> > 
>> > Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2
>> > is 0 too.
>> > 
>> > The mask array assembled in get_old_set_new() is:
>> > mask[0]: 0x1ff
>> > mask[1]: 0xff
>> > 
>> > So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2.
>> 
>> So the lookup is then to find an entry with all current values except
>> for the one intended to be changed. If that's a correct understanding
>> of mine, then I think the primitives should be more primitive, making
>> the overall operation
>> 1) get all current values into an array,
>> 2) update the one array slot that's meant to get changed,
>> 3) look for matching entries,
>> 4) allocate new entry if none found.
>> And with that I'm then not even sure how many of these steps
>> actually need hooks, as most if not all of this looks pretty
>> independent of the aspects of the specific CAT variant. Hooks
>> should be introduced solely for cases where behavior necessarily
>> differs between variants.
>> 
>> Jan
> By analyzing the behaviors and requirements of different CAT features, I
> divide COS find/allocate process to three steps.
> 1) get_old_set_new(): get all current values into an array and the value of
> the changed feature is replaced by the input value(m). Here, you get an 
> array
> which contains all features old value and the new value for changed feature.
> 2) find_cos(): find if there is a matching COS combination which can match
> the input array.
> 3) alloc_new_cos(): if find_cos() cannot find a matching one, allocate a new
> COS id or reuse the one which is only occupied by current domain.
> 
> So, our ideas are almost same. Only one exception to split get_old_set_new()
> function into two parts. That increases one loop to traverse all features and
> most features "set new value" callback functions will return directly because
> the type cannot match. And, I think to put value array assembling work into one
> function is reasonable. What do you think? Thanks!

How many iterations do you expect these loops to have? If it's only
a few, I think the simpler and more clear abstraction would outweigh
the performance benefits, the more that I hope none of this ends up
sitting on a performance critical path in the first place.

Jan
Yi Sun Sept. 12, 2016, 7:53 a.m. UTC | #12
On 16-09-12 01:30:22, Jan Beulich wrote:
> >>> On 12.09.16 at 05:26, <yi.y.sun@linux.intel.com> wrote:
> > On 16-09-09 02:32:25, Jan Beulich wrote:
> >> >>> On 09.09.16 at 10:09, <yi.y.sun@linux.intel.com> wrote:
> >> > First time, user wants to set L3 CAT of Dom1 to 0x1ff for example. The 
> >> > old_cos
> >> > of Dom1 is 0. L3 CAT is the first element of feature list. The COS registers
> >> > values are below at this time.
> >> > 
> >> >         -------------------------------
> >> >         | COS 0 | COS 1 | COS 2 | ... |
> >> >         -------------------------------
> >> > L3 CAT  | 0x7ff | ...   | ...   | ... |
> >> >         -------------------------------
> >> > L2 CAT  | 0xff  | ...   | ...   | ... |
> >> >         -------------------------------
> >> > 
> >> > The mask array assembled in get_old_set_new() is:
> >> > mask[0]: 0x1ff
> >> > mask[1]: 0xff
> >> > 
> >> > It cannot find a matching COS so allocate COS 1 to store the value set.
> >> > The COS registers values are changed to below now.
> >> > 
> >> >         -------------------------------
> >> >         | COS 0 | COS 1 | COS 2 | ... |
> >> >         -------------------------------
> >> > L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
> >> >         -------------------------------
> >> > L2 CAT  | 0xff  | 0xff  | ...   | ... |
> >> >         -------------------------------
> >> > 
> >> > Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2
> >> > is 0 too.
> >> > 
> >> > The mask array assembled in get_old_set_new() is:
> >> > mask[0]: 0x1ff
> >> > mask[1]: 0xff
> >> > 
> >> > So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2.
> >> 
> >> So the lookup is then to find an entry with all current values except
> >> for the one intended to be changed. If that's a correct understanding
> >> of mine, then I think the primitives should be more primitive, making
> >> the overall operation
> >> 1) get all current values into an array,
> >> 2) update the one array slot that's meant to get changed,
> >> 3) look for matching entries,
> >> 4) allocate new entry if none found.
> >> And with that I'm then not even sure how many of these steps
> >> actually need hooks, as most if not all of this looks pretty
> >> independent of the aspects of the specific CAT variant. Hooks
> >> should be introduced solely for cases where behavior necessarily
> >> differs between variants.
> >> 
> >> Jan
> > By analyzing the behaviors and requirements of different CAT features, I
> > divide COS find/allocate process to three steps.
> > 1) get_old_set_new(): get all current values into an array and the value of
> > the changed feature is replaced by the input value(m). Here, you get an 
> > array
> > which contains all features old value and the new value for changed feature.
> > 2) find_cos(): find if there is a matching COS combination which can match
> > the input array.
> > 3) alloc_new_cos(): if find_cos() cannot find a matching one, allocate a new
> > COS id or reuse the one which is only occupied by current domain.
> > 
> > So, our ideas are almost same. Only one exception to split get_old_set_new()
> > function into two parts. That increases one loop to traverse all features and
> > most features "set new value" callback functions will return directly because
> > the type cannot match. And, I think to put value array assembling work into one
> > function is reasonable. What do you think? Thanks!
> 
> How many iterations do you expect these loops to have? If it's only
> a few, I think the simpler and more clear abstraction would outweigh
> the performance benefits, the more that I hope none of this ends up
> sitting on a performance critical path in the first place.
> 
> Jan

So far, only two features and it would not be so many in the future. :)

Will add more callback function to set new value. Thanks!
diff mbox

Patch

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index bed70aa..a51ed2c 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1358,41 +1358,41 @@  long arch_do_domctl(
         switch ( domctl->u.psr_cat_op.cmd )
         {
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_CBM);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_CODE);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_CODE);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_DATA);
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_DATA);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
-            ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 &domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3);
+            ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+                              &domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_CBM);
             copyback = 1;
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE:
-            ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 &domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_CODE);
+            ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+                              &domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_CODE);
             copyback = 1;
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA:
-            ret = psr_get_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 &domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_DATA);
+            ret = psr_get_val(d, domctl->u.psr_cat_op.target,
+                              &domctl->u.psr_cat_op.data,
+                              PSR_MASK_TYPE_L3_DATA);
             copyback = 1;
             break;
 
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 0b5073c..a77b55a 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -23,22 +23,116 @@ 
 #define PSR_CAT        (1<<1)
 #define PSR_CDP        (1<<2)
 
-struct psr_cat_cbm {
-    union {
-        uint64_t cbm;
-        struct {
-            uint64_t code;
-            uint64_t data;
-        };
-    };
+/*
+ * To add a new PSR feature, please follow below steps:
+ * 1. Implement feature specific callback functions listed in
+ *    "struct feat_ops";
+ * 2. Implement a feature specific "struct feat_ops" object and register
+ *    feature specific callback functions into it;
+ * 3. Delcare feat_list object for the feature and malloc memory for it
+ *    in internal_cpu_prepare(). Correspondingly, free them in
+ *    free_feature();
+ * 4. Add feature initialization codes in internal_cpu_init().
+ */
+
+struct feat_list;
+struct psr_socket_alloc_info;
+
+/* Every feature enabled should implement such ops and callback functions. */
+struct feat_ops {
+    /*
+     * init_feature is used in cpu initialization process to do feature
+     * specific initialization works.
+     */
+    void (*init_feature)(unsigned int eax, unsigned int ebx,
+                         unsigned int ecx, unsigned int edx,
+                         struct feat_list *pFeat,
+                         struct psr_socket_alloc_info *info);
+    /*
+     * get_old_set_new is used in set value process to get all features'
+     * COS registers values according to original cos id of the domain.
+     * Then, assemble them into an mask array as feature list order.
+     * Then, set the new feature value into feature corresponding position
+     * in the array. This function is used to pair with compare_mask.
+     */
+    int (*get_old_set_new)(uint64_t *mask,
+                           struct feat_list *pFeat,
+                           unsigned int old_cos,
+                           enum mask_type type,
+                           uint64_t m);
+    /*
+     * compare_mask is used to in set value process to compare if the
+     * input mask array can match all the features' COS registers values
+     * according to input cos id.
+     */
+    int (*compare_mask)(uint64_t *mask, struct feat_list *pFeat,
+                        unsigned int cos, bool_t *found);
+    /*
+     * get_cos_max_as_type is used to get the cos_max value of the feature
+     * according to input mask_type.
+     */
+    unsigned int (*get_cos_max_as_type)(struct feat_list *pFeat,
+                                        enum mask_type type);
+    /*
+     * exceed_range is used to check if the input cos id exceeds the
+     * feature's cos_max and if the input mask value is not the default one.
+     * Even if the associated cos exceeds the cos_max, HW can work as default
+     * value. That is the reason we need check is mask value is default one.
+     * If both criteria are fulfilled, that means the input exceeds the
+     * range.
+     */
+    unsigned int (*exceed_range)(uint64_t *mask, struct feat_list *pFeat,
+                                 unsigned int cos);
+    /*
+     * write_msr is used to write feature specific MSR registers.
+     */
+    int (*write_msr)(unsigned int cos, uint64_t *mask,
+                     struct feat_list *pFeat);
+    /*
+     * get_val is used to get feature specific COS register value.
+     */
+    int (*get_val)(struct feat_list *pFeat, unsigned int cos,
+                   enum mask_type type, uint64_t *val);
+    /*
+     * get_feat_info is used to get feature specific HW info.
+     */
+    int (*get_feat_info)(struct feat_list *pFeat, enum mask_type type,
+                         uint32_t *dat0, uint32_t *dat1,
+                         uint32_t *dat2);
+    /*
+     * get_max_cos_max is used to get feature's cos_max.
+     */
+    unsigned int (*get_max_cos_max)(struct feat_list *pFeat);
+};
+
+#define MAX_FEAT_INFO_SIZE 8
+#define MAX_COS_REG_NUM  128
+
+struct feat_list {
+    unsigned int feature;
+    struct feat_ops ops;
+    unsigned int feat_info[MAX_FEAT_INFO_SIZE];
+    uint64_t cos_reg_val[MAX_COS_REG_NUM];
+    struct feat_list *pNext;
+};
+
+struct psr_ref {
     unsigned int ref;
 };
 
-struct psr_cat_socket_info {
-    unsigned int cbm_len;
-    unsigned int cos_max;
-    struct psr_cat_cbm *cos_to_cbm;
-    spinlock_t cbm_lock;
+
+#define PSR_SOCKET_L3_CAT 0
+#define PSR_SOCKET_L3_CDP 1
+
+struct psr_socket_alloc_info {
+    /*
+     * bit 1~0: [01]->L3 CAT-only, [10]->L3 CDP
+     */
+    unsigned int features;
+    unsigned int nr_feat;
+    struct feat_list *pFeat;
+    struct psr_ref *cos_ref;
+    spinlock_t mask_lock;
 };
 
 struct psr_assoc {
@@ -48,9 +142,7 @@  struct psr_assoc {
 
 struct psr_cmt *__read_mostly psr_cmt;
 
-static unsigned long *__read_mostly cat_socket_enable;
-static struct psr_cat_socket_info *__read_mostly cat_socket_info;
-static unsigned long *__read_mostly cdp_socket_enable;
+static struct psr_socket_alloc_info *__read_mostly socket_alloc_info;
 
 static unsigned int opt_psr;
 static unsigned int __initdata opt_rmid_max = 255;
@@ -58,8 +150,427 @@  static unsigned int __read_mostly opt_cos_max = 255;
 static uint64_t rmid_mask;
 static DEFINE_PER_CPU(struct psr_assoc, psr_assoc);
 
-static struct psr_cat_cbm *temp_cos_to_cbm;
+static struct psr_ref *temp_cos_ref;
+/* Every feature has its own object. */
+struct feat_list *pL3CAT;
+
+/* Common functions for supporting feature callback functions. */
+static void add_feature(struct feat_list *pHead, struct feat_list *pTmp)
+{
+    if ( NULL == pHead || NULL == pTmp )
+        return;
+
+    while ( pHead->pNext )
+        pHead = pHead->pNext;
+
+    pHead->pNext = pTmp;
+}
+
+static void free_feature(struct psr_socket_alloc_info *info)
+{
+    struct feat_list *pTmp;
+    struct feat_list *pPrev;
+
+    if ( NULL == info )
+        return;
+
+    if ( NULL == info->pFeat )
+        return;
+
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        pPrev = pTmp;
+        pTmp = pTmp->pNext;
+        clear_bit(pPrev->feature, &(info->features));
+        xfree(pPrev);
+        pPrev = NULL;
+    }
+}
+
+static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
+{
+    unsigned int first_bit, zero_bit;
+
+    /* Set bits should only in the range of [0, cbm_len). */
+    if ( cbm & (~0ull << cbm_len) )
+        return 0;
+
+    /* At least one bit need to be set. */
+    if ( cbm == 0 )
+        return 0;
+
+    first_bit = find_first_bit(&cbm, cbm_len);
+    zero_bit = find_next_zero_bit(&cbm, cbm_len, first_bit);
+
+    /* Set bits should be contiguous. */
+    if ( zero_bit < cbm_len &&
+         find_next_bit(&cbm, cbm_len, zero_bit) < cbm_len )
+        return 0;
+
+    return 1;
+}
+
+/*
+ * Features specific implementations.
+ */
+
+/* CAT/CDP data structure and callback functions implementation. */
+struct psr_cat_lvl_info {
+    unsigned int cbm_len;
+    unsigned int cos_max;
+};
+
+static void l3_cat_init_feature(unsigned int eax, unsigned int ebx,
+                                unsigned int ecx, unsigned int edx,
+                                struct feat_list *pFeat,
+                                struct psr_socket_alloc_info *info)
+{
+    struct psr_cat_lvl_info l3_cat;
+    unsigned int socket;
+    uint64_t val;
+
+    if ( MAX_FEAT_INFO_SIZE < sizeof(struct psr_cat_lvl_info) )
+        return;
+
+    /* No valid value so do not enable feature. */
+    if ( 0 == eax || 0 == edx )
+        return;
+
+    l3_cat.cbm_len = (eax & 0x1f) + 1;
+    l3_cat.cos_max = min(opt_cos_max, edx & 0xffff);
 
+    /* cos=0 is reserved as default cbm(all ones). */
+    pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
+
+    pFeat->feature = PSR_SOCKET_L3_CAT;
+    set_bit(PSR_SOCKET_L3_CAT, &(info->features));
+
+    if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
+         !test_bit(PSR_SOCKET_L3_CDP, &(info->features)) )
+    {
+        /* Data */
+        pFeat->cos_reg_val[0] = (1ull << l3_cat.cbm_len) - 1;
+        /* Code */
+        pFeat->cos_reg_val[1] = (1ull << l3_cat.cbm_len) - 1;
+
+        /* We only write mask1 since mask0 is always all ones by default. */
+        wrmsrl(MSR_IA32_PSR_L3_MASK(1),
+               (1ull << l3_cat.cbm_len) - 1);
+        rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
+        wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
+
+        /* Cut half of cos_max when CDP is enabled. */
+        l3_cat.cos_max >>= 1;
+
+        pFeat->feature = PSR_SOCKET_L3_CDP;
+        set_bit(PSR_SOCKET_L3_CDP, &(info->features));
+        clear_bit(PSR_SOCKET_L3_CAT, &(info->features));
+    }
+
+    memcpy(pFeat->feat_info, &l3_cat, sizeof(struct psr_cat_lvl_info));
+
+    info->nr_feat++;
+
+    /* Add this feature into list. */
+    add_feature(info->pFeat, pFeat);
+
+    socket = cpu_to_socket(smp_processor_id());
+    printk(XENLOG_INFO "L3 CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
+           socket, pFeat->feat_info[1], pFeat->feat_info[0],
+           test_bit(PSR_SOCKET_L3_CDP, &(info->features)) ? "on" : "off");
+}
+
+static int l3_cat_compare_mask(uint64_t *mask, struct feat_list *pFeat,
+                               unsigned int cos, bool_t *found)
+{
+    struct psr_cat_lvl_info cat_info;
+    uint64_t l3_def_cbm;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
+
+    /* CDP */
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+    {
+        if ( cos > cat_info.cos_max )
+        {
+            if ( mask[0] != l3_def_cbm ||
+                 mask[1] != l3_def_cbm )
+            {
+                *found = 0;
+                printk(XENLOG_ERR "CDP exceed cos max.\n");
+                return -ENOENT;
+            }
+            *found = 1;
+            return 2;
+        }
+
+        if ( mask[0] == pFeat->cos_reg_val[cos * 2] &&
+             mask[1] == pFeat->cos_reg_val[cos * 2 + 1])
+            *found = 1;
+        else
+            *found = 0;
+
+        return 2;
+    }
+
+    /* CAT */
+    if ( cos > cat_info.cos_max )
+    {
+        if ( mask[0] != l3_def_cbm )
+        {
+            *found = 0;
+            printk(XENLOG_ERR "CAT exceed cos max.\n");
+            return -ENOENT;
+        }
+        *found = 1;
+        return 1;
+    }
+
+    if ( mask[0] == pFeat->cos_reg_val[cos] )
+        *found = 1;
+    else
+        *found = 0;
+
+    return 1;
+}
+
+static unsigned int l3_cat_get_cos_max_as_type(struct feat_list *pFeat,
+                                               enum mask_type type)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    if ( type != PSR_MASK_TYPE_L3_DATA &&
+         type != PSR_MASK_TYPE_L3_CODE &&
+         type != PSR_MASK_TYPE_L3_CBM )
+        return 0;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+    return cat_info.cos_max;
+}
+
+static unsigned int l3_cat_exceed_range(uint64_t *mask, struct feat_list *pFeat,
+                                        unsigned int cos)
+{
+    struct psr_cat_lvl_info cat_info;
+    uint64_t l3_def_cbm;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+    l3_def_cbm = (1ull << cat_info.cbm_len) - 1;
+
+    /* CDP */
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+    {
+        if ( cos > cat_info.cos_max )
+            if ( mask[0] != l3_def_cbm ||
+                 mask[1] != l3_def_cbm )
+                /*
+                 * Exceed cos_max and value to set is not default,
+                 * return error.
+                 */
+                return 0;
+
+        return 2;
+    }
+
+    /* CAT */
+    if ( cos > cat_info.cos_max )
+        if ( mask[0] != l3_def_cbm )
+            /*
+             * Exceed cos_max and value to set is not default,
+             * return error.
+             */
+            return 0;
+
+    return 1;
+}
+
+static int l3_cat_write_msr(unsigned int cos, uint64_t *mask,
+                            struct feat_list *pFeat)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+
+    /* CDP */
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+    {
+        /*
+         * If input cos is more than the cos_max of the feature, we should
+         * not set the value.
+         */
+        if ( cos > cat_info.cos_max )
+            return 2;
+
+        /* Data */
+        pFeat->cos_reg_val[cos * 2] = mask[0];
+        /* Code */
+        pFeat->cos_reg_val[cos * 2 + 1] = mask[1];
+
+        wrmsrl(MSR_IA32_PSR_L3_MASK_DATA(cos), mask[0]);
+        wrmsrl(MSR_IA32_PSR_L3_MASK_CODE(cos), mask[1]);
+        return 2;
+    }
+
+    /* CAT */
+    if ( cos > cat_info.cos_max )
+        return 1;
+
+    pFeat->cos_reg_val[cos] = mask[0];
+    wrmsrl(MSR_IA32_PSR_L3_MASK(cos), mask[0]);
+    return 1;
+}
+
+static int l3_cat_get_old_set_new(uint64_t *mask,
+                                  struct feat_list *pFeat,
+                                  unsigned int old_cos,
+                                  enum mask_type type,
+                                  uint64_t m)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+
+    if ( type == PSR_MASK_TYPE_L3_DATA ||
+         type == PSR_MASK_TYPE_L3_CODE ||
+         type == PSR_MASK_TYPE_L3_CBM )
+    {
+        if ( !psr_check_cbm(cat_info.cbm_len, m) )
+            return -EINVAL;
+    }
+
+    if ( ( type == PSR_MASK_TYPE_L3_DATA ||
+         type == PSR_MASK_TYPE_L3_CODE ) &&
+         pFeat->feature != PSR_SOCKET_L3_CDP )
+        return -ENXIO;
+
+    /* CDP */
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+    {
+        if ( old_cos > cat_info.cos_max )
+        {
+            /* Data */
+            mask[0] = pFeat->cos_reg_val[0];
+            /* Code */
+            mask[1] = pFeat->cos_reg_val[1];
+        }
+        else
+        {
+            /* Data */
+            mask[0] = pFeat->cos_reg_val[old_cos * 2];
+            /* Code */
+            mask[1] = pFeat->cos_reg_val[old_cos * 2 + 1];
+        }
+
+        /* Set new mask */
+        if ( type == PSR_MASK_TYPE_L3_DATA )
+            mask[0] = m;
+        if ( type == PSR_MASK_TYPE_L3_CODE )
+            mask[1] = m;
+
+        return 2;
+    }
+
+    /* CAT */
+    if ( old_cos > cat_info.cos_max )
+        mask[0] =  pFeat->cos_reg_val[0];
+    else
+        mask[0] =  pFeat->cos_reg_val[old_cos];
+
+    if ( type == PSR_MASK_TYPE_L3_CBM )
+        mask[0] = m;
+
+    return 1;
+}
+
+static int l3_cat_get_val(struct feat_list *pFeat, unsigned int cos,
+                          enum mask_type type, uint64_t *val)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    if ( type != PSR_MASK_TYPE_L3_DATA &&
+         type != PSR_MASK_TYPE_L3_CODE &&
+         type != PSR_MASK_TYPE_L3_CBM )
+         return 0;
+
+    if ( ( type == PSR_MASK_TYPE_L3_DATA ||
+         type == PSR_MASK_TYPE_L3_CODE ) &&
+         pFeat->feature != PSR_SOCKET_L3_CDP )
+        return -ENXIO;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+
+    /* CDP */
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+    {
+        if ( cos > cat_info.cos_max )
+            *val = pFeat->cos_reg_val[0];
+        else if ( type == PSR_MASK_TYPE_L3_DATA )
+            *val = pFeat->cos_reg_val[cos * 2];
+        else if ( type == PSR_MASK_TYPE_L3_CODE )
+            *val = pFeat->cos_reg_val[cos * 2 + 1];
+
+        return 1;
+    }
+
+    /* CAT */
+    if ( cos > cat_info.cos_max )
+        *val =  pFeat->cos_reg_val[0];
+    else
+        *val =  pFeat->cos_reg_val[cos];
+
+    return 1;
+}
+
+static int l3_cat_get_feat_info(struct feat_list *pFeat, enum mask_type type,
+                                uint32_t *dat0, uint32_t *dat1,
+                                uint32_t *dat2)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    if ( type != PSR_MASK_TYPE_L3_DATA &&
+         type != PSR_MASK_TYPE_L3_CODE &&
+         type != PSR_MASK_TYPE_L3_CBM )
+        return 0;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+
+    *dat0 = cat_info.cbm_len;
+    *dat1 = cat_info.cos_max;
+
+    if ( pFeat->feature == PSR_SOCKET_L3_CDP )
+        *dat2 |= XEN_SYSCTL_PSR_CAT_L3_CDP;
+    else
+        *dat2 = 0;
+
+    return 1;
+}
+
+static unsigned int l3_cat_get_max_cos_max(struct feat_list *pFeat)
+{
+    struct psr_cat_lvl_info cat_info;
+
+    memcpy(&cat_info, pFeat->feat_info, sizeof(struct psr_cat_lvl_info));
+
+    return cat_info.cos_max;
+}
+
+struct feat_ops l3_cat_ops = {
+    .init_feature = l3_cat_init_feature,
+    .get_old_set_new = l3_cat_get_old_set_new,
+    .compare_mask = l3_cat_compare_mask,
+    .get_cos_max_as_type = l3_cat_get_cos_max_as_type,
+    .exceed_range = l3_cat_exceed_range,
+    .write_msr = l3_cat_write_msr,
+    .get_val = l3_cat_get_val,
+    .get_feat_info = l3_cat_get_feat_info,
+    .get_max_cos_max = l3_cat_get_max_cos_max,
+};
+
+/*
+ * Common functions implementation
+ */
 static unsigned int get_socket_cpu(unsigned int socket)
 {
     if ( likely(socket < nr_sockets) )
@@ -209,17 +720,39 @@  void psr_free_rmid(struct domain *d)
     d->arch.psr_rmid = 0;
 }
 
+static inline unsigned int get_max_cos_max(struct psr_socket_alloc_info *info)
+{
+    struct feat_list *pTmp;
+    unsigned int cos_max = 0;
+
+    if ( NULL == info->pFeat || NULL == info->pFeat->pNext )
+        return 0;
+
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        cos_max = max(pTmp->ops.get_max_cos_max(pTmp), cos_max);
+        pTmp = pTmp->pNext;
+    }
+
+    return cos_max;
+}
+
 static inline void psr_assoc_init(void)
 {
     struct psr_assoc *psra = &this_cpu(psr_assoc);
+    struct psr_socket_alloc_info *info;
+    unsigned int cos_max;
+    unsigned int socket;
 
-    if ( cat_socket_info )
+    if ( socket_alloc_info )
     {
-        unsigned int socket = cpu_to_socket(smp_processor_id());
+        socket = cpu_to_socket(smp_processor_id());
+        info = socket_alloc_info + socket;
+        cos_max = get_max_cos_max(info);
 
-        if ( test_bit(socket, cat_socket_enable) )
-            psra->cos_mask = ((1ull << get_count_order(
-                             cat_socket_info[socket].cos_max)) - 1) << 32;
+        if ( info->features )
+            psra->cos_mask = ((1ull << get_count_order(cos_max)) - 1) << 32;
     }
 
     if ( psr_cmt_enabled() || psra->cos_mask )
@@ -256,270 +789,359 @@  void psr_ctxt_switch_to(struct domain *d)
         psra->val = reg;
     }
 }
-static struct psr_cat_socket_info *get_cat_socket_info(unsigned int socket)
+
+static struct psr_socket_alloc_info *get_socket_alloc_info(unsigned int socket)
 {
-    if ( !cat_socket_info )
+    struct psr_socket_alloc_info *info;
+
+    if ( !socket_alloc_info )
         return ERR_PTR(-ENODEV);
 
     if ( socket >= nr_sockets )
         return ERR_PTR(-ENOTSOCK);
 
-    if ( !test_bit(socket, cat_socket_enable) )
-        return ERR_PTR(-ENOENT);
+    info = socket_alloc_info + socket;
 
-    return cat_socket_info + socket;
-}
+    if ( !info->features )
+        return ERR_PTR(-ENOENT);
 
-static inline bool_t cdp_is_enabled(unsigned int socket)
-{
-    return cdp_socket_enable && test_bit(socket, cdp_socket_enable);
+    return info;
 }
 
-int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
-                        uint32_t *cos_max, uint32_t *flags)
+int psr_get_info(unsigned int socket, enum mask_type type,
+                 uint32_t *dat0, uint32_t *dat1,
+                 uint32_t *dat2)
 {
-    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
+    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
+    struct feat_list *pTmp;
 
     if ( IS_ERR(info) )
         return PTR_ERR(info);
 
-    *cbm_len = info->cbm_len;
-    *cos_max = info->cos_max;
+    if ( NULL == info->pFeat || NULL == info->pFeat->pNext)
+       return -ENODEV;
 
-    *flags = 0;
-    if ( cdp_is_enabled(socket) )
-        *flags |= XEN_SYSCTL_PSR_CAT_L3_CDP;
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        if ( pTmp->ops.get_feat_info(pTmp, type, dat0, dat1, dat2) )
+            return 0;
 
-    return 0;
+        pTmp = pTmp->pNext;
+    }
+
+    return -ENODEV;
 }
 
-int psr_get_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t *cbm, enum cbm_type type)
+static int get_old_set_new(uint64_t *mask,
+                           struct psr_socket_alloc_info *info,
+                           unsigned int old_cos,
+                           enum mask_type type,
+                           uint64_t m)
 {
-    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
-    bool_t cdp_enabled = cdp_is_enabled(socket);
-
-    if ( IS_ERR(info) )
-        return PTR_ERR(info);
+    struct feat_list *pTmp;
+    int ret;
 
-    switch ( type )
-    {
-    case PSR_CBM_TYPE_L3:
-        if ( cdp_enabled )
-            return -EXDEV;
-        *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
-        break;
+    if ( !mask )
+        return -EINVAL;
 
-    case PSR_CBM_TYPE_L3_CODE:
-        if ( !cdp_enabled )
-            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
-        else
-            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].code;
-        break;
+    if ( !info || !info->pFeat )
+        return -ENODEV;
 
-    case PSR_CBM_TYPE_L3_DATA:
-        if ( !cdp_enabled )
-            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].cbm;
-        else
-            *cbm = info->cos_to_cbm[d->arch.psr_cos_ids[socket]].data;
-        break;
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        /* mask getting and setting order is same as feature list */
+        ret = pTmp->ops.get_old_set_new(mask, pTmp, old_cos, type, m);
+        if ( ret < 0 )
+            return ret;
 
-    default:
-        ASSERT_UNREACHABLE();
+        mask += ret;
+        pTmp = pTmp->pNext;
     }
 
     return 0;
 }
 
-static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm)
+int psr_get_val(struct domain *d, unsigned int socket,
+                uint64_t *val, enum mask_type type)
 {
-    unsigned int first_bit, zero_bit;
-
-    /* Set bits should only in the range of [0, cbm_len). */
-    if ( cbm & (~0ull << cbm_len) )
-        return 0;
+    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
+    unsigned int cos = d->arch.psr_cos_ids[socket];
+    struct feat_list *pTmp;
+    int ret;
 
-    /* At least one bit need to be set. */
-    if ( cbm == 0 )
-        return 0;
+    if ( IS_ERR(info) )
+        return PTR_ERR(info);
 
-    first_bit = find_first_bit(&cbm, cbm_len);
-    zero_bit = find_next_zero_bit(&cbm, cbm_len, first_bit);
+    if ( NULL == info->pFeat || NULL == info->pFeat->pNext )
+        return -ENODEV;
 
-    /* Set bits should be contiguous. */
-    if ( zero_bit < cbm_len &&
-         find_next_bit(&cbm, cbm_len, zero_bit) < cbm_len )
-        return 0;
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        ret = pTmp->ops.get_val(pTmp, cos, type, val);
+        if ( ret < 0 )
+            return -EINVAL;
+        else if ( ret > 0 )
+            /* Found */
+            break;
+
+        pTmp = pTmp->pNext;
+    }
 
-    return 1;
+    return 0;
 }
 
-struct cos_cbm_info
+struct cos_mask_info
 {
     unsigned int cos;
-    bool_t cdp;
-    uint64_t cbm_code;
-    uint64_t cbm_data;
+    struct feat_list *pFeat;
+    uint64_t *mask;
 };
 
-static void do_write_l3_cbm(void *data)
+static void do_write_psr_ref(void *data)
 {
-    struct cos_cbm_info *info = data;
+    struct cos_mask_info *info = (struct cos_mask_info *)data;
+    unsigned int cos           = info->cos;
+    struct feat_list *pFeat    = info->pFeat;
+    uint64_t *mask             = info->mask;
+    struct feat_list *pTmp;
+    int ret;
+
+    if ( NULL == pFeat || NULL == pFeat->pNext)
+        return;
 
-    if ( info->cdp )
+    pTmp = pFeat->pNext;
+    while ( pTmp )
     {
-        wrmsrl(MSR_IA32_PSR_L3_MASK_CODE(info->cos), info->cbm_code);
-        wrmsrl(MSR_IA32_PSR_L3_MASK_DATA(info->cos), info->cbm_data);
+        ret = pTmp->ops.write_msr(cos, mask, pTmp);
+        if ( ret <= 0)
+            return;
+
+        mask += ret;
+        pTmp = pTmp->pNext;
     }
-    else
-        wrmsrl(MSR_IA32_PSR_L3_MASK(info->cos), info->cbm_code);
 }
 
-static int write_l3_cbm(unsigned int socket, unsigned int cos,
-                        uint64_t cbm_code, uint64_t cbm_data, bool_t cdp)
+static int write_psr_ref(unsigned int socket, unsigned int cos,
+                          uint64_t *mask)
 {
-    struct cos_cbm_info info =
+    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
+
+    struct cos_mask_info data =
     {
         .cos = cos,
-        .cbm_code = cbm_code,
-        .cbm_data = cbm_data,
-        .cdp = cdp,
+        .pFeat = info->pFeat,
+        .mask = mask,
     };
 
     if ( socket == cpu_to_socket(smp_processor_id()) )
-        do_write_l3_cbm(&info);
+        do_write_psr_ref(&data);
     else
     {
         unsigned int cpu = get_socket_cpu(socket);
 
         if ( cpu >= nr_cpu_ids )
             return -ENOTSOCK;
-        on_selected_cpus(cpumask_of(cpu), do_write_l3_cbm, &info, 1);
+        on_selected_cpus(cpumask_of(cpu), do_write_psr_ref, &data, 1);
     }
 
     return 0;
 }
 
-static int find_cos(struct psr_cat_cbm *map, unsigned int cos_max,
-                    uint64_t cbm_code, uint64_t cbm_data, bool_t cdp_enabled)
+static int find_cos(uint64_t *mask, enum mask_type type,
+                    struct psr_socket_alloc_info *info)
 {
     unsigned int cos;
+    struct psr_ref *map = info->cos_ref;
+    struct feat_list *pTmp;
+    uint64_t *pMask = mask;
+    int ret;
+    bool_t found = 0;
+    unsigned int cos_max = 0;
+
+    /* cos_max is the max COS of the feature to be set. */
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
+        if ( cos_max > 0 )
+            break;
+
+        pTmp = pTmp->pNext;
+    }
 
+    pTmp = info->pFeat->pNext;
     for ( cos = 0; cos <= cos_max; cos++ )
     {
-        if ( (map[cos].ref || cos == 0) &&
-             ((!cdp_enabled && map[cos].cbm == cbm_code) ||
-              (cdp_enabled && map[cos].code == cbm_code &&
-                              map[cos].data == cbm_data)) )
+        if ( 0 != cos && 0 == map[cos].ref )
+            continue;
+
+        while ( pTmp )
+        {
+            /*
+             * Compare mask according to feature list order.
+             * We must follow this order because mask is assembled
+             * as this order in get_old_set_new().
+             */
+            ret = pTmp->ops.compare_mask(pMask, pTmp, cos, &found);
+            if ( ret < 0 )
+                return ret;
+
+            /* If fail to match, go to next cos to compare. */
+            if ( !found )
+                break;
+
+            pMask += ret;
+            pTmp = pTmp->pNext;
+        }
+
+        /* Every feature can match, this cos is what we find. */
+        if ( found )
             return cos;
+
+        /* Not found, need find again from beginning. */
+        pTmp = info->pFeat->pNext;
+        pMask = mask;
     }
 
+    printk(XENLOG_INFO "%s: cannot find cos.\n", __func__);
     return -ENOENT;
 }
 
-static int pick_avail_cos(struct psr_cat_cbm *map, unsigned int cos_max,
-                          unsigned int old_cos)
+static int check_exceed_range(uint64_t *mask, struct feat_list *pTmp,
+                              unsigned int cos)
 {
+    unsigned int ret = 0;
+
+    while ( pTmp )
+    {
+        ret = pTmp->ops.exceed_range(mask, pTmp, cos);
+        if ( 0 == ret )
+            return -ENOENT;
+
+        mask += ret;
+        pTmp = pTmp->pNext;
+    }
+
+    return 0;
+}
+
+static int alloc_new_cos(struct psr_socket_alloc_info *info,
+                         uint64_t *mask, unsigned int old_cos,
+                         enum mask_type type)
+{
+    unsigned int cos_max = 0;
+    struct feat_list *pTmp;
     unsigned int cos;
+    struct psr_ref *map = info->cos_ref;
+
+    /*
+     * cos_max is the max COS of the feature to be set.
+     */
+    pTmp = info->pFeat->pNext;
+    while ( pTmp )
+    {
+        cos_max = pTmp->ops.get_cos_max_as_type(pTmp, type);
+        if ( cos_max > 0 )
+            break;
+
+        pTmp = pTmp->pNext;
+    }
+
+    if ( !cos_max )
+        return -ENOENT;
 
     /* If old cos is referred only by the domain, then use it. */
     if ( map[old_cos].ref == 1 && old_cos != 0 )
+    {
+        pTmp = info->pFeat->pNext;
+        if ( check_exceed_range(mask, pTmp, old_cos) )
+            goto find_new;
+
         return old_cos;
+    }
 
+find_new:
     /* Find an unused one other than cos0. */
     for ( cos = 1; cos <= cos_max; cos++ )
+        /*
+         * ref is 0 means this COS is not occupied by other domain and
+         * can be used for current setting.
+         */
         if ( map[cos].ref == 0 )
+        {
+            pTmp = info->pFeat->pNext;
+            if ( check_exceed_range(mask, pTmp, cos) )
+                return -ENOENT;
+
             return cos;
+        }
 
+    printk(XENLOG_INFO "%s: all COSs have been occupied.\n", __func__);
     return -ENOENT;
 }
 
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type)
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint64_t val, enum mask_type type)
 {
-    unsigned int old_cos, cos_max;
+    unsigned int old_cos;
     int cos, ret;
-    uint64_t cbm_data, cbm_code;
-    bool_t cdp_enabled = cdp_is_enabled(socket);
-    struct psr_cat_cbm *map;
-    struct psr_cat_socket_info *info = get_cat_socket_info(socket);
+    struct psr_ref *map;
+    uint64_t *mask;
+    struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
 
     if ( IS_ERR(info) )
         return PTR_ERR(info);
 
-    if ( !psr_check_cbm(info->cbm_len, cbm) )
-        return -EINVAL;
-
-    if ( !cdp_enabled && (type == PSR_CBM_TYPE_L3_CODE ||
-                          type == PSR_CBM_TYPE_L3_DATA) )
-        return -ENXIO;
-
-    cos_max = info->cos_max;
+    map = info->cos_ref;
     old_cos = d->arch.psr_cos_ids[socket];
-    map = info->cos_to_cbm;
-
-    switch ( type )
-    {
-    case PSR_CBM_TYPE_L3:
-        cbm_code = cbm;
-        cbm_data = cbm;
-        break;
-
-    case PSR_CBM_TYPE_L3_CODE:
-        cbm_code = cbm;
-        cbm_data = map[old_cos].data;
-        break;
-
-    case PSR_CBM_TYPE_L3_DATA:
-        cbm_code = map[old_cos].code;
-        cbm_data = cbm;
-        break;
 
-    default:
-        ASSERT_UNREACHABLE();
-        return -EINVAL;
-    }
+    /* Considering CDP liking features */
+    mask = xzalloc_array(uint64_t, info->nr_feat * 2);
+    if ( (ret = get_old_set_new(mask, info, old_cos, type, val)) != 0 )
+        return ret;
 
-    spin_lock(&info->cbm_lock);
-    cos = find_cos(map, cos_max, cbm_code, cbm_data, cdp_enabled);
+    spin_lock(&info->mask_lock);
+    cos = find_cos(mask, type, info);
     if ( cos >= 0 )
     {
         if ( cos == old_cos )
         {
-            spin_unlock(&info->cbm_lock);
+            spin_unlock(&info->mask_lock);
+            xfree(mask);
             return 0;
         }
     }
     else
     {
-        cos = pick_avail_cos(map, cos_max, old_cos);
+        cos = alloc_new_cos(info, mask, old_cos, type);
         if ( cos < 0 )
         {
-            spin_unlock(&info->cbm_lock);
+            spin_unlock(&info->mask_lock);
+            xfree(mask);
             return cos;
         }
 
-        /* We try to avoid writing MSR. */
-        if ( (cdp_enabled &&
-             (map[cos].code != cbm_code || map[cos].data != cbm_data)) ||
-             (!cdp_enabled && map[cos].cbm != cbm_code) )
+        /* Write all features mask MSRs corresponding to the COS */
+        ret = write_psr_ref(socket, cos, mask);
+        if ( ret )
         {
-            ret = write_l3_cbm(socket, cos, cbm_code, cbm_data, cdp_enabled);
-            if ( ret )
-            {
-                spin_unlock(&info->cbm_lock);
-                return ret;
-            }
-            map[cos].code = cbm_code;
-            map[cos].data = cbm_data;
+            spin_unlock(&info->mask_lock);
+            xfree(mask);
+            return ret;
         }
     }
 
     map[cos].ref++;
     map[old_cos].ref--;
-    spin_unlock(&info->cbm_lock);
+
+    spin_unlock(&info->mask_lock);
 
     d->arch.psr_cos_ids[socket] = cos;
+    xfree(mask);
+    mask = NULL;
 
     return 0;
 }
@@ -529,20 +1151,23 @@  static void psr_free_cos(struct domain *d)
 {
     unsigned int socket;
     unsigned int cos;
-    struct psr_cat_socket_info *info;
+    struct psr_socket_alloc_info *info;
 
     if( !d->arch.psr_cos_ids )
         return;
 
-    for_each_set_bit(socket, cat_socket_enable, nr_sockets)
+    for( socket = 0; socket < nr_sockets; socket++)
     {
         if ( (cos = d->arch.psr_cos_ids[socket]) == 0 )
             continue;
 
-        info = cat_socket_info + socket;
-        spin_lock(&info->cbm_lock);
-        info->cos_to_cbm[cos].ref--;
-        spin_unlock(&info->cbm_lock);
+        info = get_socket_alloc_info(socket);
+        if ( IS_ERR(info) )
+            continue;
+
+        spin_lock(&info->mask_lock);
+        info->cos_ref[cos].ref--;
+        spin_unlock(&info->mask_lock);
     }
 
     xfree(d->arch.psr_cos_ids);
@@ -551,7 +1176,7 @@  static void psr_free_cos(struct domain *d)
 
 int psr_domain_init(struct domain *d)
 {
-    if ( cat_socket_info )
+    if ( socket_alloc_info )
     {
         d->arch.psr_cos_ids = xzalloc_array(unsigned int, nr_sockets);
         if ( !d->arch.psr_cos_ids )
@@ -567,137 +1192,147 @@  void psr_domain_free(struct domain *d)
     psr_free_cos(d);
 }
 
-static int cat_cpu_prepare(unsigned int cpu)
+static int internal_cpu_prepare(unsigned int cpu)
 {
-    if ( !cat_socket_info )
+    if ( !socket_alloc_info )
         return 0;
 
-    if ( temp_cos_to_cbm == NULL &&
-         (temp_cos_to_cbm = xzalloc_array(struct psr_cat_cbm,
+    if ( temp_cos_ref == NULL &&
+         (temp_cos_ref = xzalloc_array(struct psr_ref,
                                           opt_cos_max + 1UL)) == NULL )
         return -ENOMEM;
 
+    /* Malloc memory for the global feature head here. */
+    if ( pL3CAT == NULL &&
+         (pL3CAT = xzalloc(struct feat_list)) == NULL )
+        return -ENOMEM;
+
     return 0;
 }
 
-static void cat_cpu_init(void)
+static void internal_cpu_init(void)
 {
     unsigned int eax, ebx, ecx, edx;
-    struct psr_cat_socket_info *info;
+    struct psr_socket_alloc_info *info;
     unsigned int socket;
     unsigned int cpu = smp_processor_id();
-    uint64_t val;
     const struct cpuinfo_x86 *c = cpu_data + cpu;
+    struct feat_list *pTmp;
 
     if ( !cpu_has(c, X86_FEATURE_PQE) || c->cpuid_level < PSR_CPUID_LEVEL_CAT )
         return;
 
     socket = cpu_to_socket(cpu);
-    if ( test_bit(socket, cat_socket_enable) )
+    info = socket_alloc_info + socket;
+    if ( info->features )
         return;
 
     cpuid_count(PSR_CPUID_LEVEL_CAT, 0, &eax, &ebx, &ecx, &edx);
     if ( ebx & PSR_RESOURCE_TYPE_L3 )
     {
-        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
-        info = cat_socket_info + socket;
-        info->cbm_len = (eax & 0x1f) + 1;
-        info->cos_max = min(opt_cos_max, edx & 0xffff);
-
-        info->cos_to_cbm = temp_cos_to_cbm;
-        temp_cos_to_cbm = NULL;
-        /* cos=0 is reserved as default cbm(all ones). */
-        info->cos_to_cbm[0].cbm = (1ull << info->cbm_len) - 1;
+        pTmp = pL3CAT;
+        if ( !pTmp )
+            return;
+        pL3CAT = NULL;
 
-        spin_lock_init(&info->cbm_lock);
-
-        set_bit(socket, cat_socket_enable);
-
-        if ( (ecx & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) &&
-             cdp_socket_enable && !test_bit(socket, cdp_socket_enable) )
-        {
-            info->cos_to_cbm[0].code = (1ull << info->cbm_len) - 1;
-            info->cos_to_cbm[0].data = (1ull << info->cbm_len) - 1;
-
-            /* We only write mask1 since mask0 is always all ones by default. */
-            wrmsrl(MSR_IA32_PSR_L3_MASK(1), (1ull << info->cbm_len) - 1);
-
-            rdmsrl(MSR_IA32_PSR_L3_QOS_CFG, val);
-            wrmsrl(MSR_IA32_PSR_L3_QOS_CFG, val | (1 << PSR_L3_QOS_CDP_ENABLE_BIT));
-
-            /* Cut half of cos_max when CDP is enabled. */
-            info->cos_max >>= 1;
-
-            set_bit(socket, cdp_socket_enable);
-        }
-        printk(XENLOG_INFO "CAT: enabled on socket %u, cos_max:%u, cbm_len:%u, CDP:%s\n",
-               socket, info->cos_max, info->cbm_len,
-               cdp_is_enabled(socket) ? "on" : "off");
+        /* Initialize this feature according to CPUID. */
+        cpuid_count(PSR_CPUID_LEVEL_CAT, 1, &eax, &ebx, &ecx, &edx);
+        pTmp->ops = l3_cat_ops;
+        pTmp->ops.init_feature(eax, ebx, ecx, edx, pTmp, info);
     }
 }
 
-static void cat_cpu_fini(unsigned int cpu)
+static void internal_cpu_fini(unsigned int cpu)
 {
     unsigned int socket = cpu_to_socket(cpu);
 
     if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) )
     {
-        struct psr_cat_socket_info *info = cat_socket_info + socket;
+        struct psr_socket_alloc_info *info = get_socket_alloc_info(socket);
+
+        free_feature(info);
 
-        if ( info->cos_to_cbm )
+        if ( info->cos_ref )
         {
-            xfree(info->cos_to_cbm);
-            info->cos_to_cbm = NULL;
+            xfree(info->cos_ref);
+            info->cos_ref = NULL;
         }
-
-        if ( cdp_is_enabled(socket) )
-            clear_bit(socket, cdp_socket_enable);
-
-        clear_bit(socket, cat_socket_enable);
     }
 }
 
-static void __init psr_cat_free(void)
+static void __init psr_free(void)
 {
-    xfree(cat_socket_enable);
-    cat_socket_enable = NULL;
-    xfree(cat_socket_info);
-    cat_socket_info = NULL;
+    int i;
+
+    for ( i = 0; i < nr_sockets; i++ )
+    {
+        free_feature(&socket_alloc_info[i]);
+        xfree(socket_alloc_info[i].pFeat);
+        socket_alloc_info[i].pFeat = NULL;
+    }
+
+    xfree(socket_alloc_info);
+    socket_alloc_info = NULL;
 }
 
-static void __init init_psr_cat(void)
+static void __init init_psr(void)
 {
+    int i;
+
     if ( opt_cos_max < 1 )
     {
         printk(XENLOG_INFO "CAT: disabled, cos_max is too small\n");
         return;
     }
 
-    cat_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
-    cat_socket_info = xzalloc_array(struct psr_cat_socket_info, nr_sockets);
-    cdp_socket_enable = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_sockets));
+    socket_alloc_info = xzalloc_array(struct psr_socket_alloc_info, nr_sockets);
+
+    if ( !socket_alloc_info )
+    {
+        printk(XENLOG_INFO "Fail to alloc socket_alloc_info!\n");
+        return;
+    }
 
-    if ( !cat_socket_enable || !cat_socket_info )
-        psr_cat_free();
+    for ( i = 0; i < nr_sockets; i++ )
+    {
+        socket_alloc_info[i].pFeat =  xzalloc(struct feat_list);
+        if ( NULL == socket_alloc_info[i].pFeat )
+        {
+            printk(XENLOG_INFO "Fail to alloc pFeat!\n");
+            return;
+        }
+        socket_alloc_info[i].pFeat->pNext = NULL;
+    }
 }
 
 static int psr_cpu_prepare(unsigned int cpu)
 {
-    return cat_cpu_prepare(cpu);
+    return internal_cpu_prepare(cpu);
 }
 
 static void psr_cpu_init(void)
 {
-    if ( cat_socket_info )
-        cat_cpu_init();
+    unsigned int socket;
+    struct psr_socket_alloc_info *info;
+
+    if ( socket_alloc_info )
+    {
+        socket = cpu_to_socket(smp_processor_id());
+        info = socket_alloc_info + socket;
+        info->cos_ref = temp_cos_ref;
+        temp_cos_ref = NULL;
+        spin_lock_init(&info->mask_lock);
+
+        internal_cpu_init();
+    }
 
     psr_assoc_init();
 }
 
 static void psr_cpu_fini(unsigned int cpu)
 {
-    if ( cat_socket_info )
-        cat_cpu_fini(cpu);
+    if ( socket_alloc_info )
+        internal_cpu_fini(cpu);
 }
 
 static int cpu_callback(
@@ -739,13 +1374,13 @@  static int __init psr_presmp_init(void)
         init_psr_cmt(opt_rmid_max);
 
     if ( opt_psr & PSR_CAT )
-        init_psr_cat();
+        init_psr();
 
     if ( psr_cpu_prepare(0) )
-        psr_cat_free();
+        psr_free();
 
     psr_cpu_init();
-    if ( psr_cmt_enabled() || cat_socket_info )
+    if ( psr_cmt_enabled() || socket_alloc_info )
         register_cpu_notifier(&cpu_nfb);
 
     return 0;
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 14e7dc7..a143e7a 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -176,10 +176,11 @@  long arch_do_sysctl(
         switch ( sysctl->u.psr_cat_op.cmd )
         {
         case XEN_SYSCTL_PSR_CAT_get_l3_info:
-            ret = psr_get_cat_l3_info(sysctl->u.psr_cat_op.target,
-                                      &sysctl->u.psr_cat_op.u.l3_info.cbm_len,
-                                      &sysctl->u.psr_cat_op.u.l3_info.cos_max,
-                                      &sysctl->u.psr_cat_op.u.l3_info.flags);
+            ret = psr_get_info(sysctl->u.psr_cat_op.target,
+                               PSR_MASK_TYPE_L3_CBM,
+                               &sysctl->u.psr_cat_op.u.l3_info.cbm_len,
+                               &sysctl->u.psr_cat_op.u.l3_info.cos_max,
+                               &sysctl->u.psr_cat_op.u.l3_info.flags);
 
             if ( !ret && __copy_field_to_guest(u_sysctl, sysctl, u.psr_cat_op) )
                 ret = -EFAULT;
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 57f47e9..0993863 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -46,10 +46,10 @@  struct psr_cmt {
     struct psr_cmt_l3 l3;
 };
 
-enum cbm_type {
-    PSR_CBM_TYPE_L3,
-    PSR_CBM_TYPE_L3_CODE,
-    PSR_CBM_TYPE_L3_DATA,
+enum mask_type {
+    PSR_MASK_TYPE_L3_CBM,
+    PSR_MASK_TYPE_L3_CODE,
+    PSR_MASK_TYPE_L3_DATA,
 };
 
 extern struct psr_cmt *psr_cmt;
@@ -63,12 +63,13 @@  int psr_alloc_rmid(struct domain *d);
 void psr_free_rmid(struct domain *d);
 void psr_ctxt_switch_to(struct domain *d);
 
-int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len,
-                        uint32_t *cos_max, uint32_t *flags);
-int psr_get_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t *cbm, enum cbm_type type);
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type);
+int psr_get_info(unsigned int socket, enum mask_type type,
+                 uint32_t *dat0, uint32_t *dat1,
+                 uint32_t *dat2);
+int psr_get_val(struct domain *d, unsigned int socket,
+                uint64_t *val, enum mask_type type);
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint64_t val, enum mask_type type);
 
 int psr_domain_init(struct domain *d);
 void psr_domain_free(struct domain *d);