diff mbox

[v10,09/25] x86: refactor psr: L3 CAT: set value: implement framework.

Message ID 1491054836-30488-10-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun April 1, 2017, 1:53 p.m. UTC
As set value flow is the most complicated one in psr, it will be
divided to some patches to make things clearer. This patch
implements the set value framework to show a whole picture firstly.

It also changes domctl interface to make it more general.

To make the set value flow be general and can support multiple features
at same time, it includes below steps:
1. Get COS ID that current domain is using.
2. Gather a value array to store all features current value
   into it and replace the current value of the feature which is
   being set to the new input value.
3. Find if there is already a COS ID on which all features'
   values are same as the array. Then, we can reuse this COS
   ID.
4. If fail to find, we need pick an available COS ID. Only COS ID which ref
   is 0 or 1 can be picked.
5. Write the feature's MSR according to the COS ID and cbm_type.
6. Update ref according to COS ID.
7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
   can know which COS the domain is using on the socket.

So, some functions are abstracted and the callback functions will be
implemented in next patches.

Here is an example to understand the process. The CPU supports
two featuers, e.g. L3 CAT and L2 CAT. User wants to set L3 CAT
of Dom1 to 0x1ff.
1. At the initial time, the old_cos of Dom1 is 0. The COS registers values
are below at this time.
        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x7ff | 0x7ff | ... |
        -------------------------------
L2 CAT  | 0xff  | 0xff  | 0xff  | ... |
        -------------------------------

2. Gather the value array and insert new value into it:
val[0]: 0x1ff
val[1]: 0xff

3. It cannot find a matching COS.

4. Pick COS 1 to store the value set.

5. Write the L3 CAT COS 1 registers. The COS registers values are
changed to below now.
        -------------------------------
        | COS 0 | COS 1 | COS 2 | ... |
        -------------------------------
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
        -------------------------------
L2 CAT  | 0xff  | 0xff  | ...   | ... |
        -------------------------------

6. The ref[1] is increased to 1 because Dom1 is using it now.

7. Save 1 to Dom1's psr_cos_ids[socket].

Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
of Dom2 is 0 too. Repeat above flow.

The val array assembled is:
val[0]: 0x1ff
val[1]: 0xff

So, it can find a matching COS, COS 1. Then, it can reuse COS 1
for Dom2.

The ref[1] is increased to 2 now because both Dom1 and Dom2 are
using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].

Another thing need to emphasize is the context switch. When context
switch happens, 'psr_ctxt_switch_to' is called by system to get
domain's COS ID from 'psr_cos_ids[socket]'. But 'psr_cos_ids[socket]'
is set at step 7 above. So, there are three scenarios, e.g.:
1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
   psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
   called which makes COS ID 1 work. For this case, no action is needed.

2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl
   interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
   time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.

3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
   is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
   The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
   So even the COS ID got here is wrong, it is still a workable ID (within
   cos_max). The functionality is still workable, the actual valid CBM will be
   effective for a short time. In next schedule, the correct CBM will take
   effect.

All these cases will not cause race condition and no harm to system. The PSR
features are to set cache capacity for a domain. The setting to cache is
progressively effective. When the cache setting becomes really effective, the
time slice to schedule a domain may have passed. So, even if a wrong COS ID is
used to set ASSOC, only another valid CBM be effective for a short time during
cache preparation time. The correct COS ID will take effect in a short time.
This does not affect cache capacity setting much.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
v10:
    - restore domain cos id to 0 when socket is offline.
      (suggested by Jan Beulich)
    - check 'psr_cat_op.data' to make sure only lower 32 bits are valid.
      (suggested by Jan Beulich)
    - remove unnecessary fixed width type of parameters and variables.
      (suggested by Jan Beulich)
    - rename 'insert_new_val_to_array' to 'insert_val_to_array'.
      (suggested by Jan Beulich)
    - input 'ref_lock' pointer into functions to check if it has been locked.
      (suggested by Jan Beulich)
    - add comment to declare the set process is protected by 'domctl_lock'.
      (suggested by Jan Beulich)
    - check 'feat_type'.
      (suggested by Jan Beulich)
    - remove 'feat_mask'.
      (suggested by Jan Beulich)
    - remove unnecessary criteria of ASSERT.
      (suggested by Jan Beulich)
    - adjust flow of 'psr_set_val' to avoid 'goto' for successful cases.
      (suggested by Jan Beulich)
    - use ASSERT to check 'socket_info' in 'psr_free_cos'.
      (suggested by Jan Beulich)
    - remove unnecessary comment in 'psr_free_cos'.
      (suggested by Jan Beulich)
v9:
    - use goto style error handling in 'psr_set_val'.
      (suggested by Wei Liu)
    - use ASSERT for checking old_cos.
      (suggested by Wei Liu and Jan Beulich)
    - fix coding style issue.
      (suggested by Wei Liu)
    - rename 'assemble_val_array' to 'combine_val_array' in pervious patch.
      (suggested by Wei Liu)
    - use 'spin_is_locked' to check ref_lock.
      (suggested by Roger Pau)
    - add an input parameter 'array_len' for 'write_psr_msr'.
    - check 'socket_info' and 'psr_cos_ids' in this patch.
      (suggested by Jan Beulich)
    - modify patch title to indicate 'L3 CAT'.
      (suggested by Jan Beulich)
    - fix commit message words.
      (suggested by Jan Beulich)
    - change 'assemble_val_array' to 'gather_val_array'.
      (suggested by Jan Beulich)
    - change 'set_new_val_to_array' to 'insert_new_val_to_array'.
      (suggested by Jan Beulich)
    - change parameter 'm' of 'insert_new_val_to_array' to 'new_val'.
      (suggested by Jan Beulich)
    - change 'write_psr_msr' to 'write_psr_msrs'.
      (suggested by Jan Beulich)
    - correct comments.
      (suggested by Jan Beulich)
    - remove unnecessary comments.
      (suggested by Jan Beulich)
    - adjust conditions after 'find_cos' to save a level of indentation.
      (suggested by Jan Beulich)
    - add 'ASSERT(!old_cos || ref[old_cos])'.
      (suggested by Jan Beulich)
    - move ASSERT() check into locked region.
      (suggested by Jan Beulich)
    - replace parameter '*val' to 'val[]' in some functions.
      (suggested by Jan Beulich)
    - change 'write_psr_msr' parameters to prepare to only set one new value
      for one feature.
      (suggested by Jan Beulich)
    - changes about 'uint64_t' to 'uint32_t'.
      (suggested by Jan Beulich)
    - add explanation about context switch.
      (suggested by Jan Beulich)
v5:
    - modify commit message.
      (suggested by Jan Beulich)
    - return an error for all helper functions in set flow.
      (suggested by Jan Beulich)
    - remove unnecessary cast.
      (suggested by Jan Beulich)
    - divide 'get_old_set_new' to two functions, 'assemble_val_array' and
      'set_new_val_to_array'.
      (suggested by Jan Beulich)
    - modify comments.
      (suggested by Jan Beulich)
    - adjust code format.
      (suggested by Jan Beulich)
    - change 'alloc_new_cos' to 'pick_avail_cos' to make name accurate.
      (suggested by Jan Beulich)
    - check feature type when entering 'psr_set_val'.
      (suggested by Jan Beulich)
    - use ASSERT to check ref.
      (suggested by Jan Beulich)
    - rename 'dat[]' to 'data[]'.
      (suggested by Jan Beulich)
v4:
    - create this patch to make codes easier to understand.
      (suggested by Jan Beulich)
---
 xen/arch/x86/domctl.c     |  30 +++++--
 xen/arch/x86/psr.c        | 215 +++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/psr.h |   4 +-
 3 files changed, 236 insertions(+), 13 deletions(-)

Comments

Jan Beulich April 11, 2017, 3:01 p.m. UTC | #1
>>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket)
>  {
>      unsigned int i;
>      struct psr_socket_info *info = socket_info + socket;
> +    struct domain *d;
>  
>      if ( !info )
>          return;
>  
> +    /* Restore domain cos id to 0 when socket is offline. */
> +    for_each_domain ( d )
> +    {
> +        unsigned int cos = d->arch.psr_cos_ids[socket];
> +        if ( cos == 0 )

Blank line between declaration(s) and statement(s) please.

> +            continue;
> +
> +        spin_lock(&info->ref_lock);
> +        ASSERT(!cos || info->cos_ref[cos]);

You've checked cos to be non-zero already above.

> +        info->cos_ref[cos]--;
> +        spin_unlock(&info->ref_lock);
> +
> +        d->arch.psr_cos_ids[socket] = 0;
> +    }

Overall, while you say in the revision log that this was a suggestion of
mine, I don't recall any such (and I've just checked the v9 thread of
this patch without finding anything), and hence it's not really clear to
me why this is needed. After all you should be freeing info anyway
(albeit I can't see this happening, which would look to be a bug in
patch 5), so getting the refcounts adjusted seems pointless in any
event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
certain - this may indeed by unavoidable, to match up with
psr_alloc_cos() using xzalloc.

Furthermore I'm not at all convinced this is appropriate to do in the
context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
have a few thousand VMs, the loop above may take a while.

> @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket,
>      return 0;
>  }
>  
> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -                   uint64_t cbm, enum cbm_type type)
> +/* Set value functions */
> +static unsigned int get_cos_num(const struct psr_socket_info *info)
>  {
>      return 0;
>  }
>  
> +static int gather_val_array(uint32_t val[],
> +                            unsigned int array_len,
> +                            const struct psr_socket_info *info,
> +                            unsigned int old_cos)
> +{
> +    return -EINVAL;
> +}
> +
> +static int insert_val_to_array(uint32_t val[],

As indicated before, I'm pretty convinced "into" would be more
natural than "to".

> +                               unsigned int array_len,
> +                               const struct psr_socket_info *info,
> +                               enum psr_feat_type feat_type,
> +                               enum cbm_type type,
> +                               uint32_t new_val)
> +{
> +    return -EINVAL;
> +}
> +
> +static int find_cos(const uint32_t val[], unsigned int array_len,
> +                    enum psr_feat_type feat_type,
> +                    const struct psr_socket_info *info,
> +                    spinlock_t *ref_lock)

I don't think I did suggest adding another parameter here. The lock
is accessible from info, isn't it? In which case, as I _did_ suggest,
you should drop const from it instead. But ...

> +{
> +    ASSERT(spin_is_locked(ref_lock));

... the assertion seems rather pointless anyway with there just
being a single caller, which very visibly acquires the lock up front.

> +static int pick_avail_cos(const struct psr_socket_info *info,
> +                          spinlock_t *ref_lock,

Same here then.

> +int psr_set_val(struct domain *d, unsigned int socket,
> +                uint32_t val, enum cbm_type type)
> +{
> +    unsigned int old_cos;
> +    int cos, ret;
> +    unsigned int *ref;
> +    uint32_t *val_array;
> +    struct psr_socket_info *info = get_socket_info(socket);
> +    unsigned int array_len;
> +    enum psr_feat_type feat_type;
> +
> +    if ( IS_ERR(info) )
> +        return PTR_ERR(info);
> +
> +    feat_type = psr_cbm_type_to_feat_type(type);
> +    if ( feat_type > ARRAY_SIZE(info->features) ||

I think I did point out the off-by-one mistake here in an earlier patch
already.

> +         !info->features[feat_type] )
> +        return -ENOENT;
> +
> +    /*
> +     * Step 0:
> +     * old_cos means the COS ID current domain is using. By default, it is 0.
> +     *
> +     * For every COS ID, there is a reference count to record how many domains
> +     * are using the COS register corresponding to this COS ID.
> +     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> +     * - If ref[old_cos] is 1, that means this COS is only used by current
> +     *   domain.
> +     * - If ref[old_cos] is more than 1, that mean multiple domains are using
> +     *   this COS.
> +     */
> +    old_cos = d->arch.psr_cos_ids[socket];
> +    ASSERT(old_cos < MAX_COS_REG_CNT);
> +
> +    ref = info->cos_ref;
> +
> +    /*
> +     * Step 1:
> +     * Gather a value array to store all features cos_reg_val[old_cos].
> +     * And, set the input new val into array according to the feature's
> +     * position in array.
> +     */
> +    array_len = get_cos_num(info);
> +    val_array = xzalloc_array(uint32_t, array_len);
> +    if ( !val_array )
> +        return -ENOMEM;
> +
> +    if ( (ret = gather_val_array(val_array, array_len, info, old_cos)) != 0 )
> +        goto free_array;
> +
> +    if ( (ret = insert_val_to_array(val_array, array_len, info,
> +                                    feat_type, type, val)) != 0 )
> +        goto free_array;
> +
> +    spin_lock(&info->ref_lock);
> +
> +    /*
> +     * Step 2:
> +     * Try to find if there is already a COS ID on which all features' values
> +     * are same as the array. Then, we can reuse this COS ID.
> +     */
> +    cos = find_cos(val_array, array_len, feat_type, info, &info->ref_lock);
> +    if ( cos == old_cos )
> +    {
> +        ret = 0;
> +        goto unlock_free_array;
> +    }
> +
> +    /*
> +     * Step 3:
> +     * If fail to find, we need pick an available COS ID.
> +     * In fact, only COS ID which ref is 1 or 0 can be picked for current
> +     * domain. If old_cos is not 0 and its ref==1, that means only current
> +     * domain is using this old_cos ID. So, this old_cos ID certainly can
> +     * be reused by current domain. Ref==0 means there is no any domain
> +     * using this COS ID. So it can be used for current domain too.
> +     */
> +    if ( cos < 0 )
> +    {
> +        cos = pick_avail_cos(info, &info->ref_lock, val_array,
> +                             array_len, old_cos, feat_type);
> +        if ( cos < 0 )
> +        {
> +            ret = cos;
> +            goto unlock_free_array;
> +        }
> +
> +        /*
> +         * Step 4:
> +         * Write all features MSRs according to the COS ID.
> +         */
> +        ret = write_psr_msr(socket, cos, val, feat_type);
> +        if ( ret )
> +            goto unlock_free_array;
> +    }
> +
> +    /*
> +     * Step 5:
> +     * Find the COS ID (find_cos result is '>= 0' or an available COS ID is
> +     * picked, then update ref according to COS ID.
> +     */
> +    ref[cos]++;
> +    ASSERT(!cos || ref[cos]);
> +    ASSERT(!old_cos || ref[old_cos]);
> +    ref[old_cos]--;
> +    spin_unlock(&info->ref_lock);
> +
> +    /*
> +     * Step 6:
> +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> +     * which COS the domain is using on the socket. One domain can only use
> +     * one COS ID at same time on each socket.
> +     */
> +    d->arch.psr_cos_ids[socket] = cos;
> +

Please put the "free_array" label here instead of duplicating the code
below.

> +    xfree(val_array);
> +    return ret;
> +
> + unlock_free_array:
> +    spin_unlock(&info->ref_lock);
> + free_array:
> +    xfree(val_array);
> +    return ret;
> +}
> +
>  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
>  static void psr_free_cos(struct domain *d)
>  {
> +    unsigned int socket, cos;
> +
> +    ASSERT(socket_info);
> +
> +    if ( !d->arch.psr_cos_ids )
> +        return;
> +
> +    /* Domain is destroied so its cos_ref should be decreased. */
> +    for ( socket = 0; socket < nr_sockets; socket++ )
> +    {
> +        struct psr_socket_info *info;
> +
> +        /* cos 0 is default one which does not need be handled. */
> +        cos = d->arch.psr_cos_ids[socket];
> +        if ( cos == 0 )
> +            continue;
> +
> +        info = socket_info + socket;
> +        spin_lock(&info->ref_lock);
> +        ASSERT(!cos || info->cos_ref[cos]);
> +        info->cos_ref[cos]--;

This recurring pattern of assertion and decrement could surely be
put in a helper function (and the for symmetry also for increment).

Jan
Yi Sun April 12, 2017, 5:53 a.m. UTC | #2
On 17-04-11 09:01:53, Jan Beulich wrote:
> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket)
> >  {
> >      unsigned int i;
> >      struct psr_socket_info *info = socket_info + socket;
> > +    struct domain *d;
> >  
> >      if ( !info )
> >          return;
> >  
> > +    /* Restore domain cos id to 0 when socket is offline. */
> > +    for_each_domain ( d )
> > +    {
> > +        unsigned int cos = d->arch.psr_cos_ids[socket];
> > +        if ( cos == 0 )
> 
> Blank line between declaration(s) and statement(s) please.
> 
Ok, will modify other places where have same issue.

> > +            continue;
> > +
> > +        spin_lock(&info->ref_lock);
> > +        ASSERT(!cos || info->cos_ref[cos]);
> 
> You've checked cos to be non-zero already above.
> 
Yep. Thanks!

> > +        info->cos_ref[cos]--;
> > +        spin_unlock(&info->ref_lock);
> > +
> > +        d->arch.psr_cos_ids[socket] = 0;
> > +    }
> 
> Overall, while you say in the revision log that this was a suggestion of
> mine, I don't recall any such (and I've just checked the v9 thread of
> this patch without finding anything), and hence it's not really clear to
> me why this is needed. After all you should be freeing info anyway

We discussed this in v9 05 patch. I paste it below for your convenience to
check.
[Sun Yi]:
  > So, you think the MSRs values may not be valid after such process and 
  > reloading (write MSRs to default value) is needed. If so, I would like 
  > to do more operations in 'free_feature()':
  > 1. Iterate all domains working on the offline socket to change
  >    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
  >    status.
  > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
  > 
  > These can make the socket's info be totally restored back to init status.

[Jan]
  Yes, that's what I think is needed.

> (albeit I can't see this happening, which would look to be a bug in
> patch 5), so getting the refcounts adjusted seems pointless in any
> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not

We only free resources in 'socket_info[socekt]' but do not free itself.
Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
is not a pointer that can be directly freed.
  socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);

That is the reason to reduce the 'info->cos_ref[cos]'.

> certain - this may indeed by unavoidable, to match up with
> psr_alloc_cos() using xzalloc.
> 
> Furthermore I'm not at all convinced this is appropriate to do in the
> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> have a few thousand VMs, the loop above may take a while.
> 
Hmm, that may be a potential issue. I have two proposals below. Could you
please help to check which one you prefer? Or provide another solution?

1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]'
   of all domains. The action is protected by 'ref_lock' to avoid confliction
   in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
   the array to 0 in free_socket_resources().

2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
   from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
   and can memset the array to 0 when socket is offline. But here is an issue
   that we do not know how many members this array should have. I cannot find
   a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation
   in 'psr_alloc_cos' if the newly created domain's id is bigger than current
   array number.

> > @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket,
> >      return 0;
> >  }
> >  
> > -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> > -                   uint64_t cbm, enum cbm_type type)
> > +/* Set value functions */
> > +static unsigned int get_cos_num(const struct psr_socket_info *info)
> >  {
> >      return 0;
> >  }
> >  
> > +static int gather_val_array(uint32_t val[],
> > +                            unsigned int array_len,
> > +                            const struct psr_socket_info *info,
> > +                            unsigned int old_cos)
> > +{
> > +    return -EINVAL;
> > +}
> > +
> > +static int insert_val_to_array(uint32_t val[],
> 
> As indicated before, I'm pretty convinced "into" would be more
> natural than "to".
> 
Got it. Thanks!

> > +                               unsigned int array_len,
> > +                               const struct psr_socket_info *info,
> > +                               enum psr_feat_type feat_type,
> > +                               enum cbm_type type,
> > +                               uint32_t new_val)
> > +{
> > +    return -EINVAL;
> > +}
> > +
> > +static int find_cos(const uint32_t val[], unsigned int array_len,
> > +                    enum psr_feat_type feat_type,
> > +                    const struct psr_socket_info *info,
> > +                    spinlock_t *ref_lock)
> 
> I don't think I did suggest adding another parameter here. The lock
> is accessible from info, isn't it? In which case, as I _did_ suggest,
> you should drop const from it instead. But ...
> 
> > +{
> > +    ASSERT(spin_is_locked(ref_lock));
> 
> ... the assertion seems rather pointless anyway with there just
> being a single caller, which very visibly acquires the lock up front.
> 
> > +static int pick_avail_cos(const struct psr_socket_info *info,
> > +                          spinlock_t *ref_lock,
> 
> Same here then.
> 
Ok, I will drop this assertion and the parameter 'ref_lock'.

> > +int psr_set_val(struct domain *d, unsigned int socket,
> > +                uint32_t val, enum cbm_type type)
> > +{
> > +    unsigned int old_cos;
> > +    int cos, ret;
> > +    unsigned int *ref;
> > +    uint32_t *val_array;
> > +    struct psr_socket_info *info = get_socket_info(socket);
> > +    unsigned int array_len;
> > +    enum psr_feat_type feat_type;
> > +
> > +    if ( IS_ERR(info) )
> > +        return PTR_ERR(info);
> > +
> > +    feat_type = psr_cbm_type_to_feat_type(type);
> > +    if ( feat_type > ARRAY_SIZE(info->features) ||
> 
> I think I did point out the off-by-one mistake here in an earlier patch
> already.
> 
Sorry, I did not notice it.

[...]
> > +    /*
> > +     * Step 5:
> > +     * Find the COS ID (find_cos result is '>= 0' or an available COS ID is
> > +     * picked, then update ref according to COS ID.
> > +     */
> > +    ref[cos]++;
> > +    ASSERT(!cos || ref[cos]);
> > +    ASSERT(!old_cos || ref[old_cos]);
> > +    ref[old_cos]--;
> > +    spin_unlock(&info->ref_lock);
> > +
> > +    /*
> > +     * Step 6:
> > +     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
> > +     * which COS the domain is using on the socket. One domain can only use
> > +     * one COS ID at same time on each socket.
> > +     */
> > +    d->arch.psr_cos_ids[socket] = cos;
> > +
> 
> Please put the "free_array" label here instead of duplicating the code
> below.
> 
Got it. Thx!

> > +    xfree(val_array);
> > +    return ret;
> > +
> > + unlock_free_array:
> > +    spin_unlock(&info->ref_lock);
> > + free_array:
> > +    xfree(val_array);
> > +    return ret;
> > +}
> > +
> >  /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
> >  static void psr_free_cos(struct domain *d)
> >  {
> > +    unsigned int socket, cos;
> > +
> > +    ASSERT(socket_info);
> > +
> > +    if ( !d->arch.psr_cos_ids )
> > +        return;
> > +
> > +    /* Domain is destroied so its cos_ref should be decreased. */
> > +    for ( socket = 0; socket < nr_sockets; socket++ )
> > +    {
> > +        struct psr_socket_info *info;
> > +
> > +        /* cos 0 is default one which does not need be handled. */
> > +        cos = d->arch.psr_cos_ids[socket];
> > +        if ( cos == 0 )
> > +            continue;
> > +
> > +        info = socket_info + socket;
> > +        spin_lock(&info->ref_lock);
> > +        ASSERT(!cos || info->cos_ref[cos]);
> > +        info->cos_ref[cos]--;
> 
> This recurring pattern of assertion and decrement could surely be
> put in a helper function (and the for symmetry also for increment).
> 
Ok, but if we use memset to restore 'cos_ref[]' per above comment, I think it
is unnecessary.

> Jan
Jan Beulich April 12, 2017, 9:09 a.m. UTC | #3
>>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-11 09:01:53, Jan Beulich wrote:
>> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
>> > +        info->cos_ref[cos]--;
>> > +        spin_unlock(&info->ref_lock);
>> > +
>> > +        d->arch.psr_cos_ids[socket] = 0;
>> > +    }
>> 
>> Overall, while you say in the revision log that this was a suggestion of
>> mine, I don't recall any such (and I've just checked the v9 thread of
>> this patch without finding anything), and hence it's not really clear to
>> me why this is needed. After all you should be freeing info anyway
> 
> We discussed this in v9 05 patch.

Ah, that's why I didn't find it.

> I paste it below for your convenience to
> check.
> [Sun Yi]:
>   > So, you think the MSRs values may not be valid after such process and 
>   > reloading (write MSRs to default value) is needed. If so, I would like 
>   > to do more operations in 'free_feature()':
>   > 1. Iterate all domains working on the offline socket to change
>   >    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>   >    status.
>   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
>   > 
>   > These can make the socket's info be totally restored back to init status.
> 
> [Jan]
>   Yes, that's what I think is needed.
> 
>> (albeit I can't see this happening, which would look to be a bug in
>> patch 5), so getting the refcounts adjusted seems pointless in any
>> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> 
> We only free resources in 'socket_info[socekt]' but do not free itself.
> Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> is not a pointer that can be directly freed.
>   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> 
> That is the reason to reduce the 'info->cos_ref[cos]'.

I see. But then there's no need to decrement it for each domain
using it, you could simply flush it to zero.

>> certain - this may indeed by unavoidable, to match up with
>> psr_alloc_cos() using xzalloc.
>> 
>> Furthermore I'm not at all convinced this is appropriate to do in the
>> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> have a few thousand VMs, the loop above may take a while.
>> 
> Hmm, that may be a potential issue. I have two proposals below. Could you
> please help to check which one you prefer? Or provide another solution?
> 
> 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]'
>    of all domains. The action is protected by 'ref_lock' to avoid confliction
>    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
>    the array to 0 in free_socket_resources().
> 
> 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
>    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
>    and can memset the array to 0 when socket is offline. But here is an issue
>    that we do not know how many members this array should have. I cannot find
>    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation
>    in 'psr_alloc_cos' if the newly created domain's id is bigger than current
>    array number.

The number of domains is limited by the special DOMID_* values.
However, allocating an array with 32k entries doesn't sound very
reasonable. Sadly the other solution doesn't look very attractive
either, as there'll be quite a bit of synchronization needed (you'd
have to defer the same socket being re-used by a CPU being
onlined until you've done the cleanup). This may need some more
thought, but I can't see myself finding time for this any time soon,
so I'm afraid I'll have to leave it to you for now.

Jan
Yi Sun April 12, 2017, 12:23 p.m. UTC | #4
On 17-04-12 03:09:56, Jan Beulich wrote:
> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> >> > +        info->cos_ref[cos]--;
> >> > +        spin_unlock(&info->ref_lock);
> >> > +
> >> > +        d->arch.psr_cos_ids[socket] = 0;
> >> > +    }
> >> 
> >> Overall, while you say in the revision log that this was a suggestion of
> >> mine, I don't recall any such (and I've just checked the v9 thread of
> >> this patch without finding anything), and hence it's not really clear to
> >> me why this is needed. After all you should be freeing info anyway
> > 
> > We discussed this in v9 05 patch.
> 
> Ah, that's why I didn't find it.
> 
> > I paste it below for your convenience to
> > check.
> > [Sun Yi]:
> >   > So, you think the MSRs values may not be valid after such process and 
> >   > reloading (write MSRs to default value) is needed. If so, I would like 
> >   > to do more operations in 'free_feature()':
> >   > 1. Iterate all domains working on the offline socket to change
> >   >    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
> >   >    status.
> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> >   > 
> >   > These can make the socket's info be totally restored back to init status.
> > 
> > [Jan]
> >   Yes, that's what I think is needed.
> > 
> >> (albeit I can't see this happening, which would look to be a bug in
> >> patch 5), so getting the refcounts adjusted seems pointless in any
> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> > 
> > We only free resources in 'socket_info[socekt]' but do not free itself.
> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> > is not a pointer that can be directly freed.
> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> > 
> > That is the reason to reduce the 'info->cos_ref[cos]'.
> 
> I see. But then there's no need to decrement it for each domain
> using it, you could simply flush it to zero.
> 
> >> certain - this may indeed by unavoidable, to match up with
> >> psr_alloc_cos() using xzalloc.
> >> 
> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> have a few thousand VMs, the loop above may take a while.
> >> 
> > Hmm, that may be a potential issue. I have two proposals below. Could you
> > please help to check which one you prefer? Or provide another solution?
> > 
> > 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]'
> >    of all domains. The action is protected by 'ref_lock' to avoid confliction
> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
> >    the array to 0 in free_socket_resources().
> > 
> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
> >    and can memset the array to 0 when socket is offline. But here is an issue
> >    that we do not know how many members this array should have. I cannot find
> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation
> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than current
> >    array number.
> 
> The number of domains is limited by the special DOMID_* values.
> However, allocating an array with 32k entries doesn't sound very
> reasonable.

I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
when the first domain is created. If a new domain's id exceeds 100, reallocate
another 100 entries. The total number of entries allocated should be less than
32K. This is a functional requirement which cannot be avoided. How do you think?

> Sadly the other solution doesn't look very attractive
> either, as there'll be quite a bit of synchronization needed (you'd
> have to defer the same socket being re-used by a CPU being
> onlined until you've done the cleanup). This may need some more
> thought, but I can't see myself finding time for this any time soon,
> so I'm afraid I'll have to leave it to you for now.
> 
Yes, the first option need carefully consider the synchronization which is more
complex than second option.

> Jan
Jan Beulich April 12, 2017, 12:42 p.m. UTC | #5
>>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-12 03:09:56, Jan Beulich wrote:
>> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
>> >> > +        info->cos_ref[cos]--;
>> >> > +        spin_unlock(&info->ref_lock);
>> >> > +
>> >> > +        d->arch.psr_cos_ids[socket] = 0;
>> >> > +    }
>> >> 
>> >> Overall, while you say in the revision log that this was a suggestion of
>> >> mine, I don't recall any such (and I've just checked the v9 thread of
>> >> this patch without finding anything), and hence it's not really clear to
>> >> me why this is needed. After all you should be freeing info anyway
>> > 
>> > We discussed this in v9 05 patch.
>> 
>> Ah, that's why I didn't find it.
>> 
>> > I paste it below for your convenience to
>> > check.
>> > [Sun Yi]:
>> >   > So, you think the MSRs values may not be valid after such process and 
>> >   > reloading (write MSRs to default value) is needed. If so, I would like 
>> >   > to do more operations in 'free_feature()':
>> >   > 1. Iterate all domains working on the offline socket to change
>> >   >    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>> >   >    status.
>> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
>> >   > 
>> >   > These can make the socket's info be totally restored back to init 
> status.
>> > 
>> > [Jan]
>> >   Yes, that's what I think is needed.
>> > 
>> >> (albeit I can't see this happening, which would look to be a bug in
>> >> patch 5), so getting the refcounts adjusted seems pointless in any
>> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
>> > 
>> > We only free resources in 'socket_info[socekt]' but do not free itself.
>> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
>> > is not a pointer that can be directly freed.
>> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
>> > 
>> > That is the reason to reduce the 'info->cos_ref[cos]'.
>> 
>> I see. But then there's no need to decrement it for each domain
>> using it, you could simply flush it to zero.
>> 
>> >> certain - this may indeed by unavoidable, to match up with
>> >> psr_alloc_cos() using xzalloc.
>> >> 
>> >> Furthermore I'm not at all convinced this is appropriate to do in the
>> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> have a few thousand VMs, the loop above may take a while.
>> >> 
>> > Hmm, that may be a potential issue. I have two proposals below. Could you
>> > please help to check which one you prefer? Or provide another solution?
>> > 
>> > 1. Start a tasklet in free_socket_resources() to restore 
> 'psr_cos_ids[socket]'
>> >    of all domains. The action is protected by 'ref_lock' to avoid 
> confliction
>> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
>> >    the array to 0 in free_socket_resources().
>> > 
>> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
>> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
>> >    and can memset the array to 0 when socket is offline. But here is an 
> issue
>> >    that we do not know how many members this array should have. I cannot 
> find
>> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> reallocation
>> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> current
>> >    array number.
>> 
>> The number of domains is limited by the special DOMID_* values.
>> However, allocating an array with 32k entries doesn't sound very
>> reasonable.
> 
> I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
> when the first domain is created. If a new domain's id exceeds 100, reallocate
> another 100 entries. The total number of entries allocated should be less than
> 32K. This is a functional requirement which cannot be avoided. How do you 
> think?

So how many entries would your array have once I start the 32,000th
domain (having at any one time at most a single one running, besides
Dom0)?

Jan
Yi Sun April 13, 2017, 8:11 a.m. UTC | #6
On 17-04-12 06:42:01, Jan Beulich wrote:
> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> >> >> > +        info->cos_ref[cos]--;
> >> >> > +        spin_unlock(&info->ref_lock);
> >> >> > +
> >> >> > +        d->arch.psr_cos_ids[socket] = 0;
> >> >> > +    }
> >> >> 
> >> >> Overall, while you say in the revision log that this was a suggestion of
> >> >> mine, I don't recall any such (and I've just checked the v9 thread of
> >> >> this patch without finding anything), and hence it's not really clear to
> >> >> me why this is needed. After all you should be freeing info anyway
> >> > 
> >> > We discussed this in v9 05 patch.
> >> 
> >> Ah, that's why I didn't find it.
> >> 
> >> > I paste it below for your convenience to
> >> > check.
> >> > [Sun Yi]:
> >> >   > So, you think the MSRs values may not be valid after such process and 
> >> >   > reloading (write MSRs to default value) is needed. If so, I would like 
> >> >   > to do more operations in 'free_feature()':
> >> >   > 1. Iterate all domains working on the offline socket to change
> >> >   >    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
> >> >   >    status.
> >> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> >> >   > 
> >> >   > These can make the socket's info be totally restored back to init 
> > status.
> >> > 
> >> > [Jan]
> >> >   Yes, that's what I think is needed.
> >> > 
> >> >> (albeit I can't see this happening, which would look to be a bug in
> >> >> patch 5), so getting the refcounts adjusted seems pointless in any
> >> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> >> > 
> >> > We only free resources in 'socket_info[socekt]' but do not free itself.
> >> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> >> > is not a pointer that can be directly freed.
> >> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> >> > 
> >> > That is the reason to reduce the 'info->cos_ref[cos]'.
> >> 
> >> I see. But then there's no need to decrement it for each domain
> >> using it, you could simply flush it to zero.
> >> 
> >> >> certain - this may indeed by unavoidable, to match up with
> >> >> psr_alloc_cos() using xzalloc.
> >> >> 
> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> 
> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
> >> > please help to check which one you prefer? Or provide another solution?
> >> > 
> >> > 1. Start a tasklet in free_socket_resources() to restore 
> > 'psr_cos_ids[socket]'
> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
> > confliction
> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
> >> >    the array to 0 in free_socket_resources().
> >> > 
> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
> >> >    and can memset the array to 0 when socket is offline. But here is an 
> > issue
> >> >    that we do not know how many members this array should have. I cannot 
> > find
> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> > reallocation
> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> > current
> >> >    array number.
> >> 
> >> The number of domains is limited by the special DOMID_* values.
> >> However, allocating an array with 32k entries doesn't sound very
> >> reasonable.
> > 
> > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
> > when the first domain is created. If a new domain's id exceeds 100, reallocate
> > another 100 entries. The total number of entries allocated should be less than
> > 32K. This is a functional requirement which cannot be avoided. How do you 
> > think?
> 
> So how many entries would your array have once I start the 32,000th
> domain (having at any one time at most a single one running, besides
> Dom0)?
> 
In such case, we have to keep a 32K array because the domain_id is the index to
access the array. But this array is per socket so the whole memory used should
not be too much.

After considering this issue more, I think the original codes might not be
so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
288 CPUs. So, I think the domains running at same time in reality may not be
so many (no efficient resources). If this hypothesis is right, a loop to write
'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
wrong, please correct me. Thanks!
Jan Beulich April 13, 2017, 9:41 a.m. UTC | #7
>>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-12 06:42:01, Jan Beulich wrote:
>> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-12 03:09:56, Jan Beulich wrote:
>> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
>> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
>> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
>> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> >> have a few thousand VMs, the loop above may take a while.
>> >> >> 
>> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
>> >> > please help to check which one you prefer? Or provide another solution?
>> >> > 
>> >> > 1. Start a tasklet in free_socket_resources() to restore 
>> > 'psr_cos_ids[socket]'
>> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
>> > confliction
>> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
>> >> >    the array to 0 in free_socket_resources().
>> >> > 
>> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
>> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
>> >> >    and can memset the array to 0 when socket is offline. But here is an 
>> > issue
>> >> >    that we do not know how many members this array should have. I cannot 
>> > find
>> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
>> > reallocation
>> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
>> > current
>> >> >    array number.
>> >> 
>> >> The number of domains is limited by the special DOMID_* values.
>> >> However, allocating an array with 32k entries doesn't sound very
>> >> reasonable.
>> > 
>> > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
>> > when the first domain is created. If a new domain's id exceeds 100, reallocate
>> > another 100 entries. The total number of entries allocated should be less than
>> > 32K. This is a functional requirement which cannot be avoided. How do you 
>> > think?
>> 
>> So how many entries would your array have once I start the 32,000th
>> domain (having at any one time at most a single one running, besides
>> Dom0)?
>> 
> In such case, we have to keep a 32K array because the domain_id is the index to
> access the array. But this array is per socket so the whole memory used should
> not be too much.

We carefully avoid any runtime allocations of order > 0, so if you
were to set up such an array, you'd need to use vmalloc()/vzalloc().
But I continue to be unconvinced that we want such a large array
in the first place.

> After considering this issue more, I think the original codes might not be
> so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> 288 CPUs. So, I think the domains running at same time in reality may not be
> so many (no efficient resources). If this hypothesis is right, a loop to write
> 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> wrong, please correct me. Thanks!

What relationship does the number of CPUs have to the number of
domains on a host? There could be thousands with just a few dozen
CPUs, provided none or very few of them have high demands on
CPU resources. Additionally please never forget that system sizes
basically only ever grow. Plus we wouldn't want a latent issue here
in case we ever end up needing to widen domain IDs beyond 16 bits.

Jan
Yi Sun April 13, 2017, 10:49 a.m. UTC | #8
On 17-04-13 03:41:44, Jan Beulich wrote:
> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-12 06:42:01, Jan Beulich wrote:
> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> >> 
> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
> >> >> > please help to check which one you prefer? Or provide another solution?
> >> >> > 
> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> >> > 'psr_cos_ids[socket]'
> >> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
> >> > confliction
> >> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
> >> >> >    the array to 0 in free_socket_resources().
> >> >> > 
> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
> >> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
> >> >> >    and can memset the array to 0 when socket is offline. But here is an 
> >> > issue
> >> >> >    that we do not know how many members this array should have. I cannot 
> >> > find
> >> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> >> > reallocation
> >> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> >> > current
> >> >> >    array number.
> >> >> 
> >> >> The number of domains is limited by the special DOMID_* values.
> >> >> However, allocating an array with 32k entries doesn't sound very
> >> >> reasonable.
> >> > 
> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
> >> > when the first domain is created. If a new domain's id exceeds 100, reallocate
> >> > another 100 entries. The total number of entries allocated should be less than
> >> > 32K. This is a functional requirement which cannot be avoided. How do you 
> >> > think?
> >> 
> >> So how many entries would your array have once I start the 32,000th
> >> domain (having at any one time at most a single one running, besides
> >> Dom0)?
> >> 
> > In such case, we have to keep a 32K array because the domain_id is the index to
> > access the array. But this array is per socket so the whole memory used should
> > not be too much.
> 
> We carefully avoid any runtime allocations of order > 0, so if you
> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> But I continue to be unconvinced that we want such a large array
> in the first place.
> 
> > After considering this issue more, I think the original codes might not be
> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> > 288 CPUs. So, I think the domains running at same time in reality may not be
> > so many (no efficient resources). If this hypothesis is right, a loop to write
> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> > wrong, please correct me. Thanks!
> 
> What relationship does the number of CPUs have to the number of
> domains on a host? There could be thousands with just a few dozen
> CPUs, provided none or very few of them have high demands on
> CPU resources. Additionally please never forget that system sizes
> basically only ever grow. Plus we wouldn't want a latent issue here
> in case we ever end up needing to widen domain IDs beyond 16 bits.
> 
How about a per socket array like this:
uint32_t domain_switch[1024];

Every bit represents a domain id. Then, we can handle this case as below:
1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
   cover socket offline case. We do not need to clear it in 'free_socket_resources'.

2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
   the bit to 1 according to domain_id. If the old value is 0 and the 
   'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.

3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
   to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.

Then, we only use 4KB for one socket.
Jan Beulich April 13, 2017, 10:58 a.m. UTC | #9
>>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-13 03:41:44, Jan Beulich wrote:
>> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-12 06:42:01, Jan Beulich wrote:
>> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
>> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
>> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
>> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
>> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
>> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> >> >> have a few thousand VMs, the loop above may take a while.
>> >> >> >> 
>> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
>> >> >> > please help to check which one you prefer? Or provide another solution?
>> >> >> > 
>> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
>> >> > 'psr_cos_ids[socket]'
>> >> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
>> >> > confliction
>> >> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
> memset
>> >> >> >    the array to 0 in free_socket_resources().
>> >> >> > 
>> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
> index
>> >> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> socket
>> >> >> >    and can memset the array to 0 when socket is offline. But here is an 
>> >> > issue
>> >> >> >    that we do not know how many members this array should have. I cannot 
>> >> > find
>> >> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
>> >> > reallocation
>> >> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
>> >> > current
>> >> >> >    array number.
>> >> >> 
>> >> >> The number of domains is limited by the special DOMID_* values.
>> >> >> However, allocating an array with 32k entries doesn't sound very
>> >> >> reasonable.
>> >> > 
>> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> entries
>> >> > when the first domain is created. If a new domain's id exceeds 100, 
> reallocate
>> >> > another 100 entries. The total number of entries allocated should be less 
> than
>> >> > 32K. This is a functional requirement which cannot be avoided. How do you 
>> >> > think?
>> >> 
>> >> So how many entries would your array have once I start the 32,000th
>> >> domain (having at any one time at most a single one running, besides
>> >> Dom0)?
>> >> 
>> > In such case, we have to keep a 32K array because the domain_id is the 
> index to
>> > access the array. But this array is per socket so the whole memory used 
> should
>> > not be too much.
>> 
>> We carefully avoid any runtime allocations of order > 0, so if you
>> were to set up such an array, you'd need to use vmalloc()/vzalloc().
>> But I continue to be unconvinced that we want such a large array
>> in the first place.
>> 
>> > After considering this issue more, I think the original codes might not be
>> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
>> > 288 CPUs. So, I think the domains running at same time in reality may not 
> be
>> > so many (no efficient resources). If this hypothesis is right, a loop to 
> write
>> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
>> > wrong, please correct me. Thanks!
>> 
>> What relationship does the number of CPUs have to the number of
>> domains on a host? There could be thousands with just a few dozen
>> CPUs, provided none or very few of them have high demands on
>> CPU resources. Additionally please never forget that system sizes
>> basically only ever grow. Plus we wouldn't want a latent issue here
>> in case we ever end up needing to widen domain IDs beyond 16 bits.
>> 
> How about a per socket array like this:
> uint32_t domain_switch[1024];
> 
> Every bit represents a domain id. Then, we can handle this case as below:
> 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
>    cover socket offline case. We do not need to clear it in 
> 'free_socket_resources'.
> 
> 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
>    the bit to 1 according to domain_id. If the old value is 0 and the 
>    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> 
> 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
>    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> 
> Then, we only use 4KB for one socket.

This looks to come closer to something I'd consider acceptable, but
I may not understand your intentions in full yet: For one, there's
nowhere you clear the bit (other than presumably during socket
cleanup). And then I don't understand the test_and_ parts of the
constructs above, i.e. you don't clarify what the return values
would be used/needed for.

Jan
Yi Sun April 13, 2017, 11:11 a.m. UTC | #10
On 17-04-13 04:58:06, Jan Beulich wrote:
> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-13 03:41:44, Jan Beulich wrote:
> >> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-04-12 06:42:01, Jan Beulich wrote:
> >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
> >> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> >> >> 
> >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
> >> >> >> > please help to check which one you prefer? Or provide another solution?
> >> >> >> > 
> >> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> >> >> > 'psr_cos_ids[socket]'
> >> >> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
> >> >> > confliction
> >> >> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
> > memset
> >> >> >> >    the array to 0 in free_socket_resources().
> >> >> >> > 
> >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
> > index
> >> >> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> > socket
> >> >> >> >    and can memset the array to 0 when socket is offline. But here is an 
> >> >> > issue
> >> >> >> >    that we do not know how many members this array should have. I cannot 
> >> >> > find
> >> >> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> >> >> > reallocation
> >> >> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> >> >> > current
> >> >> >> >    array number.
> >> >> >> 
> >> >> >> The number of domains is limited by the special DOMID_* values.
> >> >> >> However, allocating an array with 32k entries doesn't sound very
> >> >> >> reasonable.
> >> >> > 
> >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> > entries
> >> >> > when the first domain is created. If a new domain's id exceeds 100, 
> > reallocate
> >> >> > another 100 entries. The total number of entries allocated should be less 
> > than
> >> >> > 32K. This is a functional requirement which cannot be avoided. How do you 
> >> >> > think?
> >> >> 
> >> >> So how many entries would your array have once I start the 32,000th
> >> >> domain (having at any one time at most a single one running, besides
> >> >> Dom0)?
> >> >> 
> >> > In such case, we have to keep a 32K array because the domain_id is the 
> > index to
> >> > access the array. But this array is per socket so the whole memory used 
> > should
> >> > not be too much.
> >> 
> >> We carefully avoid any runtime allocations of order > 0, so if you
> >> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> >> But I continue to be unconvinced that we want such a large array
> >> in the first place.
> >> 
> >> > After considering this issue more, I think the original codes might not be
> >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> >> > 288 CPUs. So, I think the domains running at same time in reality may not 
> > be
> >> > so many (no efficient resources). If this hypothesis is right, a loop to 
> > write
> >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> >> > wrong, please correct me. Thanks!
> >> 
> >> What relationship does the number of CPUs have to the number of
> >> domains on a host? There could be thousands with just a few dozen
> >> CPUs, provided none or very few of them have high demands on
> >> CPU resources. Additionally please never forget that system sizes
> >> basically only ever grow. Plus we wouldn't want a latent issue here
> >> in case we ever end up needing to widen domain IDs beyond 16 bits.
> >> 
> > How about a per socket array like this:
> > uint32_t domain_switch[1024];
> > 
> > Every bit represents a domain id. Then, we can handle this case as below:
> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
> >    cover socket offline case. We do not need to clear it in 
> > 'free_socket_resources'.
> > 
> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
> >    the bit to 1 according to domain_id. If the old value is 0 and the 
> >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> > 
> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
> >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> > 
> > Then, we only use 4KB for one socket.
> 
> This looks to come closer to something I'd consider acceptable, but
> I may not understand your intentions in full yet: For one, there's
> nowhere you clear the bit (other than presumably during socket
> cleanup). 

Actually, clear the array in 'free_socket_resources' has same effect. I can
move clear action into it.

> And then I don't understand the test_and_ parts of the
> constructs above, i.e. you don't clarify what the return values
> would be used/needed for.
> 
Sorry, 0 means this domain has not been scheduled to the socket yet. If
test_and_ returns 0, that is the first time the domain runs on the socket
(the first time the socket is online). So, we need restore 'psr_cos_ids[socket]'
to 0 in 'psr_ctxt_switch_to()'.
Yi Sun April 13, 2017, 11:26 a.m. UTC | #11
On 17-04-13 19:11:54, Yi Sun wrote:
> On 17-04-13 04:58:06, Jan Beulich wrote:
> > >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
> > > On 17-04-13 03:41:44, Jan Beulich wrote:
> > >> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote:
> > >> > On 17-04-12 06:42:01, Jan Beulich wrote:
> > >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote:
> > >> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> > >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote:
> > >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> > >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote:
> > >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> > >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> > >> >> >> >> have a few thousand VMs, the loop above may take a while.
> > >> >> >> >> 
> > >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
> > >> >> >> > please help to check which one you prefer? Or provide another solution?
> > >> >> >> > 
> > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> > >> >> > 'psr_cos_ids[socket]'
> > >> >> >> >    of all domains. The action is protected by 'ref_lock' to avoid 
> > >> >> > confliction
> > >> >> >> >    in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
> > > memset
> > >> >> >> >    the array to 0 in free_socket_resources().
> > >> >> >> > 
> > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
> > > index
> > >> >> >> >    from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> > > socket
> > >> >> >> >    and can memset the array to 0 when socket is offline. But here is an 
> > >> >> > issue
> > >> >> >> >    that we do not know how many members this array should have. I cannot 
> > >> >> > find
> > >> >> >> >    a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> > >> >> > reallocation
> > >> >> >> >    in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> > >> >> > current
> > >> >> >> >    array number.
> > >> >> >> 
> > >> >> >> The number of domains is limited by the special DOMID_* values.
> > >> >> >> However, allocating an array with 32k entries doesn't sound very
> > >> >> >> reasonable.
> > >> >> > 
> > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> > > entries
> > >> >> > when the first domain is created. If a new domain's id exceeds 100, 
> > > reallocate
> > >> >> > another 100 entries. The total number of entries allocated should be less 
> > > than
> > >> >> > 32K. This is a functional requirement which cannot be avoided. How do you 
> > >> >> > think?
> > >> >> 
> > >> >> So how many entries would your array have once I start the 32,000th
> > >> >> domain (having at any one time at most a single one running, besides
> > >> >> Dom0)?
> > >> >> 
> > >> > In such case, we have to keep a 32K array because the domain_id is the 
> > > index to
> > >> > access the array. But this array is per socket so the whole memory used 
> > > should
> > >> > not be too much.
> > >> 
> > >> We carefully avoid any runtime allocations of order > 0, so if you
> > >> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> > >> But I continue to be unconvinced that we want such a large array
> > >> in the first place.
> > >> 
> > >> > After considering this issue more, I think the original codes might not be
> > >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> > >> > 288 CPUs. So, I think the domains running at same time in reality may not 
> > > be
> > >> > so many (no efficient resources). If this hypothesis is right, a loop to 
> > > write
> > >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> > >> > wrong, please correct me. Thanks!
> > >> 
> > >> What relationship does the number of CPUs have to the number of
> > >> domains on a host? There could be thousands with just a few dozen
> > >> CPUs, provided none or very few of them have high demands on
> > >> CPU resources. Additionally please never forget that system sizes
> > >> basically only ever grow. Plus we wouldn't want a latent issue here
> > >> in case we ever end up needing to widen domain IDs beyond 16 bits.
> > >> 
> > > How about a per socket array like this:
> > > uint32_t domain_switch[1024];
> > > 
> > > Every bit represents a domain id. Then, we can handle this case as below:
> > > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
> > >    cover socket offline case. We do not need to clear it in 
> > > 'free_socket_resources'.
> > > 
> > > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
> > >    the bit to 1 according to domain_id. If the old value is 0 and the 
> > >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> > > 
> > > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
> > >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> > > 
> > > Then, we only use 4KB for one socket.
> > 
> > This looks to come closer to something I'd consider acceptable, but
> > I may not understand your intentions in full yet: For one, there's
> > nowhere you clear the bit (other than presumably during socket
> > cleanup). 
> 
> Actually, clear the array in 'free_socket_resources' has same effect. I can
> move clear action into it.
> 
> > And then I don't understand the test_and_ parts of the
> > constructs above, i.e. you don't clarify what the return values
> > would be used/needed for.
> > 
> Sorry, 0 means this domain has not been scheduled to the socket yet. If
> test_and_ returns 0, that is the first time the domain runs on the socket
> (the first time the socket is online). So, we need restore 'psr_cos_ids[socket]'
                 ^ missed 'after'. I mean the first time the domain is scheduled
                                   to the socket after the socket is online.
> to 0 in 'psr_ctxt_switch_to()'.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Jan Beulich April 13, 2017, 11:31 a.m. UTC | #12
>>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-13 04:58:06, Jan Beulich wrote:
>> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
>> > How about a per socket array like this:
>> > uint32_t domain_switch[1024];
>> > 
>> > Every bit represents a domain id. Then, we can handle this case as below:
>> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
>> >    cover socket offline case. We do not need to clear it in 
>> > 'free_socket_resources'.
>> > 
>> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
>> >    the bit to 1 according to domain_id. If the old value is 0 and the 
>> >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
>> > 
>> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
>> >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
>> > 
>> > Then, we only use 4KB for one socket.
>> 
>> This looks to come closer to something I'd consider acceptable, but
>> I may not understand your intentions in full yet: For one, there's
>> nowhere you clear the bit (other than presumably during socket
>> cleanup). 
> 
> Actually, clear the array in 'free_socket_resources' has same effect. I can
> move clear action into it.

That wasn't my point - I was asking about clearing individual bits.
Point being that if you only ever set bits in the map, you'll likely
end up iterating through all active domains anyway.

Jan
Yi Sun April 13, 2017, 11:44 a.m. UTC | #13
On 17-04-13 05:31:41, Jan Beulich wrote:
> >>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-13 04:58:06, Jan Beulich wrote:
> >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
> >> > How about a per socket array like this:
> >> > uint32_t domain_switch[1024];
> >> > 
> >> > Every bit represents a domain id. Then, we can handle this case as below:
> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
> >> >    cover socket offline case. We do not need to clear it in 
> >> > 'free_socket_resources'.
> >> > 
> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
> >> >    the bit to 1 according to domain_id. If the old value is 0 and the 
> >> >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> >> > 
> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
> >> >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> >> > 
> >> > Then, we only use 4KB for one socket.
> >> 
> >> This looks to come closer to something I'd consider acceptable, but
> >> I may not understand your intentions in full yet: For one, there's
> >> nowhere you clear the bit (other than presumably during socket
> >> cleanup). 
> > 
> > Actually, clear the array in 'free_socket_resources' has same effect. I can
> > move clear action into it.
> 
> That wasn't my point - I was asking about clearing individual bits.
> Point being that if you only ever set bits in the map, you'll likely
> end up iterating through all active domains anyway.
> 
If entering 'free_socket_resources', that means no more actions to
the array on this socket except clearing it. Can I just memset this array
of the socekt to 0?

> Jan
Jan Beulich April 13, 2017, 11:50 a.m. UTC | #14
>>> On 13.04.17 at 13:44, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-13 05:31:41, Jan Beulich wrote:
>> >>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-13 04:58:06, Jan Beulich wrote:
>> >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
>> >> > How about a per socket array like this:
>> >> > uint32_t domain_switch[1024];
>> >> > 
>> >> > Every bit represents a domain id. Then, we can handle this case as below:
>> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
>> >> >    cover socket offline case. We do not need to clear it in 
>> >> > 'free_socket_resources'.
>> >> > 
>> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
>> >> >    the bit to 1 according to domain_id. If the old value is 0 and the 
>> >> >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
>> >> > 
>> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
>> >> >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
>> >> > 
>> >> > Then, we only use 4KB for one socket.
>> >> 
>> >> This looks to come closer to something I'd consider acceptable, but
>> >> I may not understand your intentions in full yet: For one, there's
>> >> nowhere you clear the bit (other than presumably during socket
>> >> cleanup). 
>> > 
>> > Actually, clear the array in 'free_socket_resources' has same effect. I can
>> > move clear action into it.
>> 
>> That wasn't my point - I was asking about clearing individual bits.
>> Point being that if you only ever set bits in the map, you'll likely
>> end up iterating through all active domains anyway.
>> 
> If entering 'free_socket_resources', that means no more actions to
> the array on this socket except clearing it. Can I just memset this array
> of the socekt to 0?

You can, afaict, but unless first you act on the set bits I can't see why
you would want to track the bits in the first place. Or maybe I'm still
not understanding your intention, in which case I guess the best you
can do is simply implement your plan, and we'll discuss it in v11 review.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index dc213a7..6ed71e2 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1437,21 +1437,33 @@  long arch_do_domctl(
         switch ( domctl->u.psr_cat_op.cmd )
         {
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3);
+            if ( domctl->u.psr_cat_op.data !=
+                 (uint32_t)domctl->u.psr_cat_op.data )
+                return -EINVAL;
+
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_CODE);
+            if ( domctl->u.psr_cat_op.data !=
+                 (uint32_t)domctl->u.psr_cat_op.data )
+                return -EINVAL;
+
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3_CODE);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA:
-            ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target,
-                                 domctl->u.psr_cat_op.data,
-                                 PSR_CBM_TYPE_L3_DATA);
+            if ( domctl->u.psr_cat_op.data !=
+                 (uint32_t)domctl->u.psr_cat_op.data )
+                return -EINVAL;
+
+            ret = psr_set_val(d, domctl->u.psr_cat_op.target,
+                              domctl->u.psr_cat_op.data,
+                              PSR_CBM_TYPE_L3_DATA);
             break;
 
         case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM:
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 25fcd21..9d805d6 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -157,10 +157,26 @@  static void free_socket_resources(unsigned int socket)
 {
     unsigned int i;
     struct psr_socket_info *info = socket_info + socket;
+    struct domain *d;
 
     if ( !info )
         return;
 
+    /* Restore domain cos id to 0 when socket is offline. */
+    for_each_domain ( d )
+    {
+        unsigned int cos = d->arch.psr_cos_ids[socket];
+        if ( cos == 0 )
+            continue;
+
+        spin_lock(&info->ref_lock);
+        ASSERT(!cos || info->cos_ref[cos]);
+        info->cos_ref[cos]--;
+        spin_unlock(&info->ref_lock);
+
+        d->arch.psr_cos_ids[socket] = 0;
+    }
+
     /*
      * Free resources of features. The global feature object, e.g. feat_l3_cat,
      * may not be freed here if it is not added into array. It is simply being
@@ -574,15 +590,210 @@  int psr_get_val(struct domain *d, unsigned int socket,
     return 0;
 }
 
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type)
+/* Set value functions */
+static unsigned int get_cos_num(const struct psr_socket_info *info)
 {
     return 0;
 }
 
+static int gather_val_array(uint32_t val[],
+                            unsigned int array_len,
+                            const struct psr_socket_info *info,
+                            unsigned int old_cos)
+{
+    return -EINVAL;
+}
+
+static int insert_val_to_array(uint32_t val[],
+                               unsigned int array_len,
+                               const struct psr_socket_info *info,
+                               enum psr_feat_type feat_type,
+                               enum cbm_type type,
+                               uint32_t new_val)
+{
+    return -EINVAL;
+}
+
+static int find_cos(const uint32_t val[], unsigned int array_len,
+                    enum psr_feat_type feat_type,
+                    const struct psr_socket_info *info,
+                    spinlock_t *ref_lock)
+{
+    ASSERT(spin_is_locked(ref_lock));
+
+    return -ENOENT;
+}
+
+static int pick_avail_cos(const struct psr_socket_info *info,
+                          spinlock_t *ref_lock,
+                          const uint32_t val[], unsigned int array_len,
+                          unsigned int old_cos,
+                          enum psr_feat_type feat_type)
+{
+    ASSERT(spin_is_locked(ref_lock));
+
+    return -ENOENT;
+}
+
+static int write_psr_msr(unsigned int socket, unsigned int cos,
+                         uint32_t val, enum psr_feat_type feat_type)
+{
+    return -ENOENT;
+}
+
+/* The whole set process is protected by domctl_lock. */
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint32_t val, enum cbm_type type)
+{
+    unsigned int old_cos;
+    int cos, ret;
+    unsigned int *ref;
+    uint32_t *val_array;
+    struct psr_socket_info *info = get_socket_info(socket);
+    unsigned int array_len;
+    enum psr_feat_type feat_type;
+
+    if ( IS_ERR(info) )
+        return PTR_ERR(info);
+
+    feat_type = psr_cbm_type_to_feat_type(type);
+    if ( feat_type > ARRAY_SIZE(info->features) ||
+         !info->features[feat_type] )
+        return -ENOENT;
+
+    /*
+     * Step 0:
+     * old_cos means the COS ID current domain is using. By default, it is 0.
+     *
+     * For every COS ID, there is a reference count to record how many domains
+     * are using the COS register corresponding to this COS ID.
+     * - If ref[old_cos] is 0, that means this COS is not used by any domain.
+     * - If ref[old_cos] is 1, that means this COS is only used by current
+     *   domain.
+     * - If ref[old_cos] is more than 1, that mean multiple domains are using
+     *   this COS.
+     */
+    old_cos = d->arch.psr_cos_ids[socket];
+    ASSERT(old_cos < MAX_COS_REG_CNT);
+
+    ref = info->cos_ref;
+
+    /*
+     * Step 1:
+     * Gather a value array to store all features cos_reg_val[old_cos].
+     * And, set the input new val into array according to the feature's
+     * position in array.
+     */
+    array_len = get_cos_num(info);
+    val_array = xzalloc_array(uint32_t, array_len);
+    if ( !val_array )
+        return -ENOMEM;
+
+    if ( (ret = gather_val_array(val_array, array_len, info, old_cos)) != 0 )
+        goto free_array;
+
+    if ( (ret = insert_val_to_array(val_array, array_len, info,
+                                    feat_type, type, val)) != 0 )
+        goto free_array;
+
+    spin_lock(&info->ref_lock);
+
+    /*
+     * Step 2:
+     * Try to find if there is already a COS ID on which all features' values
+     * are same as the array. Then, we can reuse this COS ID.
+     */
+    cos = find_cos(val_array, array_len, feat_type, info, &info->ref_lock);
+    if ( cos == old_cos )
+    {
+        ret = 0;
+        goto unlock_free_array;
+    }
+
+    /*
+     * Step 3:
+     * If fail to find, we need pick an available COS ID.
+     * In fact, only COS ID which ref is 1 or 0 can be picked for current
+     * domain. If old_cos is not 0 and its ref==1, that means only current
+     * domain is using this old_cos ID. So, this old_cos ID certainly can
+     * be reused by current domain. Ref==0 means there is no any domain
+     * using this COS ID. So it can be used for current domain too.
+     */
+    if ( cos < 0 )
+    {
+        cos = pick_avail_cos(info, &info->ref_lock, val_array,
+                             array_len, old_cos, feat_type);
+        if ( cos < 0 )
+        {
+            ret = cos;
+            goto unlock_free_array;
+        }
+
+        /*
+         * Step 4:
+         * Write all features MSRs according to the COS ID.
+         */
+        ret = write_psr_msr(socket, cos, val, feat_type);
+        if ( ret )
+            goto unlock_free_array;
+    }
+
+    /*
+     * Step 5:
+     * Find the COS ID (find_cos result is '>= 0' or an available COS ID is
+     * picked, then update ref according to COS ID.
+     */
+    ref[cos]++;
+    ASSERT(!cos || ref[cos]);
+    ASSERT(!old_cos || ref[old_cos]);
+    ref[old_cos]--;
+    spin_unlock(&info->ref_lock);
+
+    /*
+     * Step 6:
+     * Save the COS ID into current domain's psr_cos_ids[] so that we can know
+     * which COS the domain is using on the socket. One domain can only use
+     * one COS ID at same time on each socket.
+     */
+    d->arch.psr_cos_ids[socket] = cos;
+
+    xfree(val_array);
+    return ret;
+
+ unlock_free_array:
+    spin_unlock(&info->ref_lock);
+ free_array:
+    xfree(val_array);
+    return ret;
+}
+
 /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */
 static void psr_free_cos(struct domain *d)
 {
+    unsigned int socket, cos;
+
+    ASSERT(socket_info);
+
+    if ( !d->arch.psr_cos_ids )
+        return;
+
+    /* Domain is destroied so its cos_ref should be decreased. */
+    for ( socket = 0; socket < nr_sockets; socket++ )
+    {
+        struct psr_socket_info *info;
+
+        /* cos 0 is default one which does not need be handled. */
+        cos = d->arch.psr_cos_ids[socket];
+        if ( cos == 0 )
+            continue;
+
+        info = socket_info + socket;
+        spin_lock(&info->ref_lock);
+        ASSERT(!cos || info->cos_ref[cos]);
+        info->cos_ref[cos]--;
+        spin_unlock(&info->ref_lock);
+    }
+
     xfree(d->arch.psr_cos_ids);
     d->arch.psr_cos_ids = NULL;
 }
diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h
index 7c6d38a..66d5218 100644
--- a/xen/include/asm-x86/psr.h
+++ b/xen/include/asm-x86/psr.h
@@ -73,8 +73,8 @@  int psr_get_info(unsigned int socket, enum cbm_type type,
                  uint32_t data[], unsigned int array_len);
 int psr_get_val(struct domain *d, unsigned int socket,
                 uint32_t *val, enum cbm_type type);
-int psr_set_l3_cbm(struct domain *d, unsigned int socket,
-                   uint64_t cbm, enum cbm_type type);
+int psr_set_val(struct domain *d, unsigned int socket,
+                uint32_t val, enum cbm_type type);
 
 int psr_domain_init(struct domain *d);
 void psr_domain_free(struct domain *d);