Message ID | 1491054836-30488-10-git-send-email-yi.y.sun@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > --- a/xen/arch/x86/psr.c > +++ b/xen/arch/x86/psr.c > @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket) > { > unsigned int i; > struct psr_socket_info *info = socket_info + socket; > + struct domain *d; > > if ( !info ) > return; > > + /* Restore domain cos id to 0 when socket is offline. */ > + for_each_domain ( d ) > + { > + unsigned int cos = d->arch.psr_cos_ids[socket]; > + if ( cos == 0 ) Blank line between declaration(s) and statement(s) please. > + continue; > + > + spin_lock(&info->ref_lock); > + ASSERT(!cos || info->cos_ref[cos]); You've checked cos to be non-zero already above. > + info->cos_ref[cos]--; > + spin_unlock(&info->ref_lock); > + > + d->arch.psr_cos_ids[socket] = 0; > + } Overall, while you say in the revision log that this was a suggestion of mine, I don't recall any such (and I've just checked the v9 thread of this patch without finding anything), and hence it's not really clear to me why this is needed. After all you should be freeing info anyway (albeit I can't see this happening, which would look to be a bug in patch 5), so getting the refcounts adjusted seems pointless in any event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not certain - this may indeed by unavoidable, to match up with psr_alloc_cos() using xzalloc. Furthermore I'm not at all convinced this is appropriate to do in the context of a CPU_UP_CANCELED / CPU_DEAD notification: If you have a few thousand VMs, the loop above may take a while. > @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket, > return 0; > } > > -int psr_set_l3_cbm(struct domain *d, unsigned int socket, > - uint64_t cbm, enum cbm_type type) > +/* Set value functions */ > +static unsigned int get_cos_num(const struct psr_socket_info *info) > { > return 0; > } > > +static int gather_val_array(uint32_t val[], > + unsigned int array_len, > + const struct psr_socket_info *info, > + unsigned int old_cos) > +{ > + return -EINVAL; > +} > + > +static int insert_val_to_array(uint32_t val[], As indicated before, I'm pretty convinced "into" would be more natural than "to". > + unsigned int array_len, > + const struct psr_socket_info *info, > + enum psr_feat_type feat_type, > + enum cbm_type type, > + uint32_t new_val) > +{ > + return -EINVAL; > +} > + > +static int find_cos(const uint32_t val[], unsigned int array_len, > + enum psr_feat_type feat_type, > + const struct psr_socket_info *info, > + spinlock_t *ref_lock) I don't think I did suggest adding another parameter here. The lock is accessible from info, isn't it? In which case, as I _did_ suggest, you should drop const from it instead. But ... > +{ > + ASSERT(spin_is_locked(ref_lock)); ... the assertion seems rather pointless anyway with there just being a single caller, which very visibly acquires the lock up front. > +static int pick_avail_cos(const struct psr_socket_info *info, > + spinlock_t *ref_lock, Same here then. > +int psr_set_val(struct domain *d, unsigned int socket, > + uint32_t val, enum cbm_type type) > +{ > + unsigned int old_cos; > + int cos, ret; > + unsigned int *ref; > + uint32_t *val_array; > + struct psr_socket_info *info = get_socket_info(socket); > + unsigned int array_len; > + enum psr_feat_type feat_type; > + > + if ( IS_ERR(info) ) > + return PTR_ERR(info); > + > + feat_type = psr_cbm_type_to_feat_type(type); > + if ( feat_type > ARRAY_SIZE(info->features) || I think I did point out the off-by-one mistake here in an earlier patch already. > + !info->features[feat_type] ) > + return -ENOENT; > + > + /* > + * Step 0: > + * old_cos means the COS ID current domain is using. By default, it is 0. > + * > + * For every COS ID, there is a reference count to record how many domains > + * are using the COS register corresponding to this COS ID. > + * - If ref[old_cos] is 0, that means this COS is not used by any domain. > + * - If ref[old_cos] is 1, that means this COS is only used by current > + * domain. > + * - If ref[old_cos] is more than 1, that mean multiple domains are using > + * this COS. > + */ > + old_cos = d->arch.psr_cos_ids[socket]; > + ASSERT(old_cos < MAX_COS_REG_CNT); > + > + ref = info->cos_ref; > + > + /* > + * Step 1: > + * Gather a value array to store all features cos_reg_val[old_cos]. > + * And, set the input new val into array according to the feature's > + * position in array. > + */ > + array_len = get_cos_num(info); > + val_array = xzalloc_array(uint32_t, array_len); > + if ( !val_array ) > + return -ENOMEM; > + > + if ( (ret = gather_val_array(val_array, array_len, info, old_cos)) != 0 ) > + goto free_array; > + > + if ( (ret = insert_val_to_array(val_array, array_len, info, > + feat_type, type, val)) != 0 ) > + goto free_array; > + > + spin_lock(&info->ref_lock); > + > + /* > + * Step 2: > + * Try to find if there is already a COS ID on which all features' values > + * are same as the array. Then, we can reuse this COS ID. > + */ > + cos = find_cos(val_array, array_len, feat_type, info, &info->ref_lock); > + if ( cos == old_cos ) > + { > + ret = 0; > + goto unlock_free_array; > + } > + > + /* > + * Step 3: > + * If fail to find, we need pick an available COS ID. > + * In fact, only COS ID which ref is 1 or 0 can be picked for current > + * domain. If old_cos is not 0 and its ref==1, that means only current > + * domain is using this old_cos ID. So, this old_cos ID certainly can > + * be reused by current domain. Ref==0 means there is no any domain > + * using this COS ID. So it can be used for current domain too. > + */ > + if ( cos < 0 ) > + { > + cos = pick_avail_cos(info, &info->ref_lock, val_array, > + array_len, old_cos, feat_type); > + if ( cos < 0 ) > + { > + ret = cos; > + goto unlock_free_array; > + } > + > + /* > + * Step 4: > + * Write all features MSRs according to the COS ID. > + */ > + ret = write_psr_msr(socket, cos, val, feat_type); > + if ( ret ) > + goto unlock_free_array; > + } > + > + /* > + * Step 5: > + * Find the COS ID (find_cos result is '>= 0' or an available COS ID is > + * picked, then update ref according to COS ID. > + */ > + ref[cos]++; > + ASSERT(!cos || ref[cos]); > + ASSERT(!old_cos || ref[old_cos]); > + ref[old_cos]--; > + spin_unlock(&info->ref_lock); > + > + /* > + * Step 6: > + * Save the COS ID into current domain's psr_cos_ids[] so that we can know > + * which COS the domain is using on the socket. One domain can only use > + * one COS ID at same time on each socket. > + */ > + d->arch.psr_cos_ids[socket] = cos; > + Please put the "free_array" label here instead of duplicating the code below. > + xfree(val_array); > + return ret; > + > + unlock_free_array: > + spin_unlock(&info->ref_lock); > + free_array: > + xfree(val_array); > + return ret; > +} > + > /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */ > static void psr_free_cos(struct domain *d) > { > + unsigned int socket, cos; > + > + ASSERT(socket_info); > + > + if ( !d->arch.psr_cos_ids ) > + return; > + > + /* Domain is destroied so its cos_ref should be decreased. */ > + for ( socket = 0; socket < nr_sockets; socket++ ) > + { > + struct psr_socket_info *info; > + > + /* cos 0 is default one which does not need be handled. */ > + cos = d->arch.psr_cos_ids[socket]; > + if ( cos == 0 ) > + continue; > + > + info = socket_info + socket; > + spin_lock(&info->ref_lock); > + ASSERT(!cos || info->cos_ref[cos]); > + info->cos_ref[cos]--; This recurring pattern of assertion and decrement could surely be put in a helper function (and the for symmetry also for increment). Jan
On 17-04-11 09:01:53, Jan Beulich wrote: > >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > > --- a/xen/arch/x86/psr.c > > +++ b/xen/arch/x86/psr.c > > @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket) > > { > > unsigned int i; > > struct psr_socket_info *info = socket_info + socket; > > + struct domain *d; > > > > if ( !info ) > > return; > > > > + /* Restore domain cos id to 0 when socket is offline. */ > > + for_each_domain ( d ) > > + { > > + unsigned int cos = d->arch.psr_cos_ids[socket]; > > + if ( cos == 0 ) > > Blank line between declaration(s) and statement(s) please. > Ok, will modify other places where have same issue. > > + continue; > > + > > + spin_lock(&info->ref_lock); > > + ASSERT(!cos || info->cos_ref[cos]); > > You've checked cos to be non-zero already above. > Yep. Thanks! > > + info->cos_ref[cos]--; > > + spin_unlock(&info->ref_lock); > > + > > + d->arch.psr_cos_ids[socket] = 0; > > + } > > Overall, while you say in the revision log that this was a suggestion of > mine, I don't recall any such (and I've just checked the v9 thread of > this patch without finding anything), and hence it's not really clear to > me why this is needed. After all you should be freeing info anyway We discussed this in v9 05 patch. I paste it below for your convenience to check. [Sun Yi]: > So, you think the MSRs values may not be valid after such process and > reloading (write MSRs to default value) is needed. If so, I would like > to do more operations in 'free_feature()': > 1. Iterate all domains working on the offline socket to change > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > status. > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > > These can make the socket's info be totally restored back to init status. [Jan] Yes, that's what I think is needed. > (albeit I can't see this happening, which would look to be a bug in > patch 5), so getting the refcounts adjusted seems pointless in any > event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not We only free resources in 'socket_info[socekt]' but do not free itself. Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]' is not a pointer that can be directly freed. socket_info = xzalloc_array(struct psr_socket_info, nr_sockets); That is the reason to reduce the 'info->cos_ref[cos]'. > certain - this may indeed by unavoidable, to match up with > psr_alloc_cos() using xzalloc. > > Furthermore I'm not at all convinced this is appropriate to do in the > context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > have a few thousand VMs, the loop above may take a while. > Hmm, that may be a potential issue. I have two proposals below. Could you please help to check which one you prefer? Or provide another solution? 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]' of all domains. The action is protected by 'ref_lock' to avoid confliction in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset the array to 0 in free_socket_resources(). 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket and can memset the array to 0 when socket is offline. But here is an issue that we do not know how many members this array should have. I cannot find a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation in 'psr_alloc_cos' if the newly created domain's id is bigger than current array number. > > @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket, > > return 0; > > } > > > > -int psr_set_l3_cbm(struct domain *d, unsigned int socket, > > - uint64_t cbm, enum cbm_type type) > > +/* Set value functions */ > > +static unsigned int get_cos_num(const struct psr_socket_info *info) > > { > > return 0; > > } > > > > +static int gather_val_array(uint32_t val[], > > + unsigned int array_len, > > + const struct psr_socket_info *info, > > + unsigned int old_cos) > > +{ > > + return -EINVAL; > > +} > > + > > +static int insert_val_to_array(uint32_t val[], > > As indicated before, I'm pretty convinced "into" would be more > natural than "to". > Got it. Thanks! > > + unsigned int array_len, > > + const struct psr_socket_info *info, > > + enum psr_feat_type feat_type, > > + enum cbm_type type, > > + uint32_t new_val) > > +{ > > + return -EINVAL; > > +} > > + > > +static int find_cos(const uint32_t val[], unsigned int array_len, > > + enum psr_feat_type feat_type, > > + const struct psr_socket_info *info, > > + spinlock_t *ref_lock) > > I don't think I did suggest adding another parameter here. The lock > is accessible from info, isn't it? In which case, as I _did_ suggest, > you should drop const from it instead. But ... > > > +{ > > + ASSERT(spin_is_locked(ref_lock)); > > ... the assertion seems rather pointless anyway with there just > being a single caller, which very visibly acquires the lock up front. > > > +static int pick_avail_cos(const struct psr_socket_info *info, > > + spinlock_t *ref_lock, > > Same here then. > Ok, I will drop this assertion and the parameter 'ref_lock'. > > +int psr_set_val(struct domain *d, unsigned int socket, > > + uint32_t val, enum cbm_type type) > > +{ > > + unsigned int old_cos; > > + int cos, ret; > > + unsigned int *ref; > > + uint32_t *val_array; > > + struct psr_socket_info *info = get_socket_info(socket); > > + unsigned int array_len; > > + enum psr_feat_type feat_type; > > + > > + if ( IS_ERR(info) ) > > + return PTR_ERR(info); > > + > > + feat_type = psr_cbm_type_to_feat_type(type); > > + if ( feat_type > ARRAY_SIZE(info->features) || > > I think I did point out the off-by-one mistake here in an earlier patch > already. > Sorry, I did not notice it. [...] > > + /* > > + * Step 5: > > + * Find the COS ID (find_cos result is '>= 0' or an available COS ID is > > + * picked, then update ref according to COS ID. > > + */ > > + ref[cos]++; > > + ASSERT(!cos || ref[cos]); > > + ASSERT(!old_cos || ref[old_cos]); > > + ref[old_cos]--; > > + spin_unlock(&info->ref_lock); > > + > > + /* > > + * Step 6: > > + * Save the COS ID into current domain's psr_cos_ids[] so that we can know > > + * which COS the domain is using on the socket. One domain can only use > > + * one COS ID at same time on each socket. > > + */ > > + d->arch.psr_cos_ids[socket] = cos; > > + > > Please put the "free_array" label here instead of duplicating the code > below. > Got it. Thx! > > + xfree(val_array); > > + return ret; > > + > > + unlock_free_array: > > + spin_unlock(&info->ref_lock); > > + free_array: > > + xfree(val_array); > > + return ret; > > +} > > + > > /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */ > > static void psr_free_cos(struct domain *d) > > { > > + unsigned int socket, cos; > > + > > + ASSERT(socket_info); > > + > > + if ( !d->arch.psr_cos_ids ) > > + return; > > + > > + /* Domain is destroied so its cos_ref should be decreased. */ > > + for ( socket = 0; socket < nr_sockets; socket++ ) > > + { > > + struct psr_socket_info *info; > > + > > + /* cos 0 is default one which does not need be handled. */ > > + cos = d->arch.psr_cos_ids[socket]; > > + if ( cos == 0 ) > > + continue; > > + > > + info = socket_info + socket; > > + spin_lock(&info->ref_lock); > > + ASSERT(!cos || info->cos_ref[cos]); > > + info->cos_ref[cos]--; > > This recurring pattern of assertion and decrement could surely be > put in a helper function (and the for symmetry also for increment). > Ok, but if we use memset to restore 'cos_ref[]' per above comment, I think it is unnecessary. > Jan
>>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > On 17-04-11 09:01:53, Jan Beulich wrote: >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: >> > + info->cos_ref[cos]--; >> > + spin_unlock(&info->ref_lock); >> > + >> > + d->arch.psr_cos_ids[socket] = 0; >> > + } >> >> Overall, while you say in the revision log that this was a suggestion of >> mine, I don't recall any such (and I've just checked the v9 thread of >> this patch without finding anything), and hence it's not really clear to >> me why this is needed. After all you should be freeing info anyway > > We discussed this in v9 05 patch. Ah, that's why I didn't find it. > I paste it below for your convenience to > check. > [Sun Yi]: > > So, you think the MSRs values may not be valid after such process and > > reloading (write MSRs to default value) is needed. If so, I would like > > to do more operations in 'free_feature()': > > 1. Iterate all domains working on the offline socket to change > > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > > status. > > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > > > > These can make the socket's info be totally restored back to init status. > > [Jan] > Yes, that's what I think is needed. > >> (albeit I can't see this happening, which would look to be a bug in >> patch 5), so getting the refcounts adjusted seems pointless in any >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not > > We only free resources in 'socket_info[socekt]' but do not free itself. > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]' > is not a pointer that can be directly freed. > socket_info = xzalloc_array(struct psr_socket_info, nr_sockets); > > That is the reason to reduce the 'info->cos_ref[cos]'. I see. But then there's no need to decrement it for each domain using it, you could simply flush it to zero. >> certain - this may indeed by unavoidable, to match up with >> psr_alloc_cos() using xzalloc. >> >> Furthermore I'm not at all convinced this is appropriate to do in the >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you >> have a few thousand VMs, the loop above may take a while. >> > Hmm, that may be a potential issue. I have two proposals below. Could you > please help to check which one you prefer? Or provide another solution? > > 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]' > of all domains. The action is protected by 'ref_lock' to avoid confliction > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset > the array to 0 in free_socket_resources(). > > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket > and can memset the array to 0 when socket is offline. But here is an issue > that we do not know how many members this array should have. I cannot find > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation > in 'psr_alloc_cos' if the newly created domain's id is bigger than current > array number. The number of domains is limited by the special DOMID_* values. However, allocating an array with 32k entries doesn't sound very reasonable. Sadly the other solution doesn't look very attractive either, as there'll be quite a bit of synchronization needed (you'd have to defer the same socket being re-used by a CPU being onlined until you've done the cleanup). This may need some more thought, but I can't see myself finding time for this any time soon, so I'm afraid I'll have to leave it to you for now. Jan
On 17-04-12 03:09:56, Jan Beulich wrote: > >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-11 09:01:53, Jan Beulich wrote: > >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > >> > + info->cos_ref[cos]--; > >> > + spin_unlock(&info->ref_lock); > >> > + > >> > + d->arch.psr_cos_ids[socket] = 0; > >> > + } > >> > >> Overall, while you say in the revision log that this was a suggestion of > >> mine, I don't recall any such (and I've just checked the v9 thread of > >> this patch without finding anything), and hence it's not really clear to > >> me why this is needed. After all you should be freeing info anyway > > > > We discussed this in v9 05 patch. > > Ah, that's why I didn't find it. > > > I paste it below for your convenience to > > check. > > [Sun Yi]: > > > So, you think the MSRs values may not be valid after such process and > > > reloading (write MSRs to default value) is needed. If so, I would like > > > to do more operations in 'free_feature()': > > > 1. Iterate all domains working on the offline socket to change > > > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > > > status. > > > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > > > > > > These can make the socket's info be totally restored back to init status. > > > > [Jan] > > Yes, that's what I think is needed. > > > >> (albeit I can't see this happening, which would look to be a bug in > >> patch 5), so getting the refcounts adjusted seems pointless in any > >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not > > > > We only free resources in 'socket_info[socekt]' but do not free itself. > > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]' > > is not a pointer that can be directly freed. > > socket_info = xzalloc_array(struct psr_socket_info, nr_sockets); > > > > That is the reason to reduce the 'info->cos_ref[cos]'. > > I see. But then there's no need to decrement it for each domain > using it, you could simply flush it to zero. > > >> certain - this may indeed by unavoidable, to match up with > >> psr_alloc_cos() using xzalloc. > >> > >> Furthermore I'm not at all convinced this is appropriate to do in the > >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > >> have a few thousand VMs, the loop above may take a while. > >> > > Hmm, that may be a potential issue. I have two proposals below. Could you > > please help to check which one you prefer? Or provide another solution? > > > > 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]' > > of all domains. The action is protected by 'ref_lock' to avoid confliction > > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset > > the array to 0 in free_socket_resources(). > > > > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index > > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket > > and can memset the array to 0 when socket is offline. But here is an issue > > that we do not know how many members this array should have. I cannot find > > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation > > in 'psr_alloc_cos' if the newly created domain's id is bigger than current > > array number. > > The number of domains is limited by the special DOMID_* values. > However, allocating an array with 32k entries doesn't sound very > reasonable. I think 32K entries should be the extreme case. I can allocate e.g. 100 entries when the first domain is created. If a new domain's id exceeds 100, reallocate another 100 entries. The total number of entries allocated should be less than 32K. This is a functional requirement which cannot be avoided. How do you think? > Sadly the other solution doesn't look very attractive > either, as there'll be quite a bit of synchronization needed (you'd > have to defer the same socket being re-used by a CPU being > onlined until you've done the cleanup). This may need some more > thought, but I can't see myself finding time for this any time soon, > so I'm afraid I'll have to leave it to you for now. > Yes, the first option need carefully consider the synchronization which is more complex than second option. > Jan
>>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: > On 17-04-12 03:09:56, Jan Beulich wrote: >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-11 09:01:53, Jan Beulich wrote: >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: >> >> > + info->cos_ref[cos]--; >> >> > + spin_unlock(&info->ref_lock); >> >> > + >> >> > + d->arch.psr_cos_ids[socket] = 0; >> >> > + } >> >> >> >> Overall, while you say in the revision log that this was a suggestion of >> >> mine, I don't recall any such (and I've just checked the v9 thread of >> >> this patch without finding anything), and hence it's not really clear to >> >> me why this is needed. After all you should be freeing info anyway >> > >> > We discussed this in v9 05 patch. >> >> Ah, that's why I didn't find it. >> >> > I paste it below for your convenience to >> > check. >> > [Sun Yi]: >> > > So, you think the MSRs values may not be valid after such process and >> > > reloading (write MSRs to default value) is needed. If so, I would like >> > > to do more operations in 'free_feature()': >> > > 1. Iterate all domains working on the offline socket to change >> > > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init >> > > status. >> > > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. >> > > >> > > These can make the socket's info be totally restored back to init > status. >> > >> > [Jan] >> > Yes, that's what I think is needed. >> > >> >> (albeit I can't see this happening, which would look to be a bug in >> >> patch 5), so getting the refcounts adjusted seems pointless in any >> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not >> > >> > We only free resources in 'socket_info[socekt]' but do not free itself. >> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]' >> > is not a pointer that can be directly freed. >> > socket_info = xzalloc_array(struct psr_socket_info, nr_sockets); >> > >> > That is the reason to reduce the 'info->cos_ref[cos]'. >> >> I see. But then there's no need to decrement it for each domain >> using it, you could simply flush it to zero. >> >> >> certain - this may indeed by unavoidable, to match up with >> >> psr_alloc_cos() using xzalloc. >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you >> >> have a few thousand VMs, the loop above may take a while. >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you >> > please help to check which one you prefer? Or provide another solution? >> > >> > 1. Start a tasklet in free_socket_resources() to restore > 'psr_cos_ids[socket]' >> > of all domains. The action is protected by 'ref_lock' to avoid > confliction >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset >> > the array to 0 in free_socket_resources(). >> > >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket >> > and can memset the array to 0 when socket is offline. But here is an > issue >> > that we do not know how many members this array should have. I cannot > find >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use > reallocation >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than > current >> > array number. >> >> The number of domains is limited by the special DOMID_* values. >> However, allocating an array with 32k entries doesn't sound very >> reasonable. > > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries > when the first domain is created. If a new domain's id exceeds 100, reallocate > another 100 entries. The total number of entries allocated should be less than > 32K. This is a functional requirement which cannot be avoided. How do you > think? So how many entries would your array have once I start the 32,000th domain (having at any one time at most a single one running, besides Dom0)? Jan
On 17-04-12 06:42:01, Jan Beulich wrote: > >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-12 03:09:56, Jan Beulich wrote: > >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > >> > On 17-04-11 09:01:53, Jan Beulich wrote: > >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > >> >> > + info->cos_ref[cos]--; > >> >> > + spin_unlock(&info->ref_lock); > >> >> > + > >> >> > + d->arch.psr_cos_ids[socket] = 0; > >> >> > + } > >> >> > >> >> Overall, while you say in the revision log that this was a suggestion of > >> >> mine, I don't recall any such (and I've just checked the v9 thread of > >> >> this patch without finding anything), and hence it's not really clear to > >> >> me why this is needed. After all you should be freeing info anyway > >> > > >> > We discussed this in v9 05 patch. > >> > >> Ah, that's why I didn't find it. > >> > >> > I paste it below for your convenience to > >> > check. > >> > [Sun Yi]: > >> > > So, you think the MSRs values may not be valid after such process and > >> > > reloading (write MSRs to default value) is needed. If so, I would like > >> > > to do more operations in 'free_feature()': > >> > > 1. Iterate all domains working on the offline socket to change > >> > > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > >> > > status. > >> > > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > >> > > > >> > > These can make the socket's info be totally restored back to init > > status. > >> > > >> > [Jan] > >> > Yes, that's what I think is needed. > >> > > >> >> (albeit I can't see this happening, which would look to be a bug in > >> >> patch 5), so getting the refcounts adjusted seems pointless in any > >> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not > >> > > >> > We only free resources in 'socket_info[socekt]' but do not free itself. > >> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]' > >> > is not a pointer that can be directly freed. > >> > socket_info = xzalloc_array(struct psr_socket_info, nr_sockets); > >> > > >> > That is the reason to reduce the 'info->cos_ref[cos]'. > >> > >> I see. But then there's no need to decrement it for each domain > >> using it, you could simply flush it to zero. > >> > >> >> certain - this may indeed by unavoidable, to match up with > >> >> psr_alloc_cos() using xzalloc. > >> >> > >> >> Furthermore I'm not at all convinced this is appropriate to do in the > >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > >> >> have a few thousand VMs, the loop above may take a while. > >> >> > >> > Hmm, that may be a potential issue. I have two proposals below. Could you > >> > please help to check which one you prefer? Or provide another solution? > >> > > >> > 1. Start a tasklet in free_socket_resources() to restore > > 'psr_cos_ids[socket]' > >> > of all domains. The action is protected by 'ref_lock' to avoid > > confliction > >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset > >> > the array to 0 in free_socket_resources(). > >> > > >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index > >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket > >> > and can memset the array to 0 when socket is offline. But here is an > > issue > >> > that we do not know how many members this array should have. I cannot > > find > >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use > > reallocation > >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than > > current > >> > array number. > >> > >> The number of domains is limited by the special DOMID_* values. > >> However, allocating an array with 32k entries doesn't sound very > >> reasonable. > > > > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries > > when the first domain is created. If a new domain's id exceeds 100, reallocate > > another 100 entries. The total number of entries allocated should be less than > > 32K. This is a functional requirement which cannot be avoided. How do you > > think? > > So how many entries would your array have once I start the 32,000th > domain (having at any one time at most a single one running, besides > Dom0)? > In such case, we have to keep a 32K array because the domain_id is the index to access the array. But this array is per socket so the whole memory used should not be too much. After considering this issue more, I think the original codes might not be so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most 288 CPUs. So, I think the domains running at same time in reality may not be so many (no efficient resources). If this hypothesis is right, a loop to write 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am wrong, please correct me. Thanks!
>>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote: > On 17-04-12 06:42:01, Jan Beulich wrote: >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-12 03:09:56, Jan Beulich wrote: >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you >> >> >> have a few thousand VMs, the loop above may take a while. >> >> >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you >> >> > please help to check which one you prefer? Or provide another solution? >> >> > >> >> > 1. Start a tasklet in free_socket_resources() to restore >> > 'psr_cos_ids[socket]' >> >> > of all domains. The action is protected by 'ref_lock' to avoid >> > confliction >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset >> >> > the array to 0 in free_socket_resources(). >> >> > >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket >> >> > and can memset the array to 0 when socket is offline. But here is an >> > issue >> >> > that we do not know how many members this array should have. I cannot >> > find >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use >> > reallocation >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than >> > current >> >> > array number. >> >> >> >> The number of domains is limited by the special DOMID_* values. >> >> However, allocating an array with 32k entries doesn't sound very >> >> reasonable. >> > >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries >> > when the first domain is created. If a new domain's id exceeds 100, reallocate >> > another 100 entries. The total number of entries allocated should be less than >> > 32K. This is a functional requirement which cannot be avoided. How do you >> > think? >> >> So how many entries would your array have once I start the 32,000th >> domain (having at any one time at most a single one running, besides >> Dom0)? >> > In such case, we have to keep a 32K array because the domain_id is the index to > access the array. But this array is per socket so the whole memory used should > not be too much. We carefully avoid any runtime allocations of order > 0, so if you were to set up such an array, you'd need to use vmalloc()/vzalloc(). But I continue to be unconvinced that we want such a large array in the first place. > After considering this issue more, I think the original codes might not be > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most > 288 CPUs. So, I think the domains running at same time in reality may not be > so many (no efficient resources). If this hypothesis is right, a loop to write > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am > wrong, please correct me. Thanks! What relationship does the number of CPUs have to the number of domains on a host? There could be thousands with just a few dozen CPUs, provided none or very few of them have high demands on CPU resources. Additionally please never forget that system sizes basically only ever grow. Plus we wouldn't want a latent issue here in case we ever end up needing to widen domain IDs beyond 16 bits. Jan
On 17-04-13 03:41:44, Jan Beulich wrote: > >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-12 06:42:01, Jan Beulich wrote: > >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: > >> > On 17-04-12 03:09:56, Jan Beulich wrote: > >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: > >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the > >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > >> >> >> have a few thousand VMs, the loop above may take a while. > >> >> >> > >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you > >> >> > please help to check which one you prefer? Or provide another solution? > >> >> > > >> >> > 1. Start a tasklet in free_socket_resources() to restore > >> > 'psr_cos_ids[socket]' > >> >> > of all domains. The action is protected by 'ref_lock' to avoid > >> > confliction > >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset > >> >> > the array to 0 in free_socket_resources(). > >> >> > > >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index > >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket > >> >> > and can memset the array to 0 when socket is offline. But here is an > >> > issue > >> >> > that we do not know how many members this array should have. I cannot > >> > find > >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use > >> > reallocation > >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than > >> > current > >> >> > array number. > >> >> > >> >> The number of domains is limited by the special DOMID_* values. > >> >> However, allocating an array with 32k entries doesn't sound very > >> >> reasonable. > >> > > >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 entries > >> > when the first domain is created. If a new domain's id exceeds 100, reallocate > >> > another 100 entries. The total number of entries allocated should be less than > >> > 32K. This is a functional requirement which cannot be avoided. How do you > >> > think? > >> > >> So how many entries would your array have once I start the 32,000th > >> domain (having at any one time at most a single one running, besides > >> Dom0)? > >> > > In such case, we have to keep a 32K array because the domain_id is the index to > > access the array. But this array is per socket so the whole memory used should > > not be too much. > > We carefully avoid any runtime allocations of order > 0, so if you > were to set up such an array, you'd need to use vmalloc()/vzalloc(). > But I continue to be unconvinced that we want such a large array > in the first place. > > > After considering this issue more, I think the original codes might not be > > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most > > 288 CPUs. So, I think the domains running at same time in reality may not be > > so many (no efficient resources). If this hypothesis is right, a loop to write > > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am > > wrong, please correct me. Thanks! > > What relationship does the number of CPUs have to the number of > domains on a host? There could be thousands with just a few dozen > CPUs, provided none or very few of them have high demands on > CPU resources. Additionally please never forget that system sizes > basically only ever grow. Plus we wouldn't want a latent issue here > in case we ever end up needing to widen domain IDs beyond 16 bits. > How about a per socket array like this: uint32_t domain_switch[1024]; Every bit represents a domain id. Then, we can handle this case as below: 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to cover socket offline case. We do not need to clear it in 'free_socket_resources'. 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set the bit to 1 according to domain_id. If the old value is 0 and the 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. Then, we only use 4KB for one socket.
>>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: > On 17-04-13 03:41:44, Jan Beulich wrote: >> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-12 06:42:01, Jan Beulich wrote: >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: >> >> > On 17-04-12 03:09:56, Jan Beulich wrote: >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you >> >> >> >> have a few thousand VMs, the loop above may take a while. >> >> >> >> >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you >> >> >> > please help to check which one you prefer? Or provide another solution? >> >> >> > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore >> >> > 'psr_cos_ids[socket]' >> >> >> > of all domains. The action is protected by 'ref_lock' to avoid >> >> > confliction >> >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or > memset >> >> >> > the array to 0 in free_socket_resources(). >> >> >> > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change > index >> >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per > socket >> >> >> > and can memset the array to 0 when socket is offline. But here is an >> >> > issue >> >> >> > that we do not know how many members this array should have. I cannot >> >> > find >> >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use >> >> > reallocation >> >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than >> >> > current >> >> >> > array number. >> >> >> >> >> >> The number of domains is limited by the special DOMID_* values. >> >> >> However, allocating an array with 32k entries doesn't sound very >> >> >> reasonable. >> >> > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 > entries >> >> > when the first domain is created. If a new domain's id exceeds 100, > reallocate >> >> > another 100 entries. The total number of entries allocated should be less > than >> >> > 32K. This is a functional requirement which cannot be avoided. How do you >> >> > think? >> >> >> >> So how many entries would your array have once I start the 32,000th >> >> domain (having at any one time at most a single one running, besides >> >> Dom0)? >> >> >> > In such case, we have to keep a 32K array because the domain_id is the > index to >> > access the array. But this array is per socket so the whole memory used > should >> > not be too much. >> >> We carefully avoid any runtime allocations of order > 0, so if you >> were to set up such an array, you'd need to use vmalloc()/vzalloc(). >> But I continue to be unconvinced that we want such a large array >> in the first place. >> >> > After considering this issue more, I think the original codes might not be >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most >> > 288 CPUs. So, I think the domains running at same time in reality may not > be >> > so many (no efficient resources). If this hypothesis is right, a loop to > write >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am >> > wrong, please correct me. Thanks! >> >> What relationship does the number of CPUs have to the number of >> domains on a host? There could be thousands with just a few dozen >> CPUs, provided none or very few of them have high demands on >> CPU resources. Additionally please never forget that system sizes >> basically only ever grow. Plus we wouldn't want a latent issue here >> in case we ever end up needing to widen domain IDs beyond 16 bits. >> > How about a per socket array like this: > uint32_t domain_switch[1024]; > > Every bit represents a domain id. Then, we can handle this case as below: > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to > cover socket offline case. We do not need to clear it in > 'free_socket_resources'. > > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set > the bit to 1 according to domain_id. If the old value is 0 and the > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. > > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. > > Then, we only use 4KB for one socket. This looks to come closer to something I'd consider acceptable, but I may not understand your intentions in full yet: For one, there's nowhere you clear the bit (other than presumably during socket cleanup). And then I don't understand the test_and_ parts of the constructs above, i.e. you don't clarify what the return values would be used/needed for. Jan
On 17-04-13 04:58:06, Jan Beulich wrote: > >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-13 03:41:44, Jan Beulich wrote: > >> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote: > >> > On 17-04-12 06:42:01, Jan Beulich wrote: > >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: > >> >> > On 17-04-12 03:09:56, Jan Beulich wrote: > >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: > >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the > >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > >> >> >> >> have a few thousand VMs, the loop above may take a while. > >> >> >> >> > >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you > >> >> >> > please help to check which one you prefer? Or provide another solution? > >> >> >> > > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore > >> >> > 'psr_cos_ids[socket]' > >> >> >> > of all domains. The action is protected by 'ref_lock' to avoid > >> >> > confliction > >> >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or > > memset > >> >> >> > the array to 0 in free_socket_resources(). > >> >> >> > > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change > > index > >> >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per > > socket > >> >> >> > and can memset the array to 0 when socket is offline. But here is an > >> >> > issue > >> >> >> > that we do not know how many members this array should have. I cannot > >> >> > find > >> >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use > >> >> > reallocation > >> >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than > >> >> > current > >> >> >> > array number. > >> >> >> > >> >> >> The number of domains is limited by the special DOMID_* values. > >> >> >> However, allocating an array with 32k entries doesn't sound very > >> >> >> reasonable. > >> >> > > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 > > entries > >> >> > when the first domain is created. If a new domain's id exceeds 100, > > reallocate > >> >> > another 100 entries. The total number of entries allocated should be less > > than > >> >> > 32K. This is a functional requirement which cannot be avoided. How do you > >> >> > think? > >> >> > >> >> So how many entries would your array have once I start the 32,000th > >> >> domain (having at any one time at most a single one running, besides > >> >> Dom0)? > >> >> > >> > In such case, we have to keep a 32K array because the domain_id is the > > index to > >> > access the array. But this array is per socket so the whole memory used > > should > >> > not be too much. > >> > >> We carefully avoid any runtime allocations of order > 0, so if you > >> were to set up such an array, you'd need to use vmalloc()/vzalloc(). > >> But I continue to be unconvinced that we want such a large array > >> in the first place. > >> > >> > After considering this issue more, I think the original codes might not be > >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most > >> > 288 CPUs. So, I think the domains running at same time in reality may not > > be > >> > so many (no efficient resources). If this hypothesis is right, a loop to > > write > >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am > >> > wrong, please correct me. Thanks! > >> > >> What relationship does the number of CPUs have to the number of > >> domains on a host? There could be thousands with just a few dozen > >> CPUs, provided none or very few of them have high demands on > >> CPU resources. Additionally please never forget that system sizes > >> basically only ever grow. Plus we wouldn't want a latent issue here > >> in case we ever end up needing to widen domain IDs beyond 16 bits. > >> > > How about a per socket array like this: > > uint32_t domain_switch[1024]; > > > > Every bit represents a domain id. Then, we can handle this case as below: > > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to > > cover socket offline case. We do not need to clear it in > > 'free_socket_resources'. > > > > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set > > the bit to 1 according to domain_id. If the old value is 0 and the > > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. > > > > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit > > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. > > > > Then, we only use 4KB for one socket. > > This looks to come closer to something I'd consider acceptable, but > I may not understand your intentions in full yet: For one, there's > nowhere you clear the bit (other than presumably during socket > cleanup). Actually, clear the array in 'free_socket_resources' has same effect. I can move clear action into it. > And then I don't understand the test_and_ parts of the > constructs above, i.e. you don't clarify what the return values > would be used/needed for. > Sorry, 0 means this domain has not been scheduled to the socket yet. If test_and_ returns 0, that is the first time the domain runs on the socket (the first time the socket is online). So, we need restore 'psr_cos_ids[socket]' to 0 in 'psr_ctxt_switch_to()'.
On 17-04-13 19:11:54, Yi Sun wrote: > On 17-04-13 04:58:06, Jan Beulich wrote: > > >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: > > > On 17-04-13 03:41:44, Jan Beulich wrote: > > >> >>> On 13.04.17 at 10:11, <yi.y.sun@linux.intel.com> wrote: > > >> > On 17-04-12 06:42:01, Jan Beulich wrote: > > >> >> >>> On 12.04.17 at 14:23, <yi.y.sun@linux.intel.com> wrote: > > >> >> > On 17-04-12 03:09:56, Jan Beulich wrote: > > >> >> >> >>> On 12.04.17 at 07:53, <yi.y.sun@linux.intel.com> wrote: > > >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote: > > >> >> >> >> >>> On 01.04.17 at 15:53, <yi.y.sun@linux.intel.com> wrote: > > >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the > > >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you > > >> >> >> >> have a few thousand VMs, the loop above may take a while. > > >> >> >> >> > > >> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could you > > >> >> >> > please help to check which one you prefer? Or provide another solution? > > >> >> >> > > > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore > > >> >> > 'psr_cos_ids[socket]' > > >> >> >> > of all domains. The action is protected by 'ref_lock' to avoid > > >> >> > confliction > > >> >> >> > in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or > > > memset > > >> >> >> > the array to 0 in free_socket_resources(). > > >> >> >> > > > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change > > > index > > >> >> >> > from 'socket' to 'domain_id'. So we keep all domains' COS IDs per > > > socket > > >> >> >> > and can memset the array to 0 when socket is offline. But here is an > > >> >> > issue > > >> >> >> > that we do not know how many members this array should have. I cannot > > >> >> > find > > >> >> >> > a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use > > >> >> > reallocation > > >> >> >> > in 'psr_alloc_cos' if the newly created domain's id is bigger than > > >> >> > current > > >> >> >> > array number. > > >> >> >> > > >> >> >> The number of domains is limited by the special DOMID_* values. > > >> >> >> However, allocating an array with 32k entries doesn't sound very > > >> >> >> reasonable. > > >> >> > > > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 > > > entries > > >> >> > when the first domain is created. If a new domain's id exceeds 100, > > > reallocate > > >> >> > another 100 entries. The total number of entries allocated should be less > > > than > > >> >> > 32K. This is a functional requirement which cannot be avoided. How do you > > >> >> > think? > > >> >> > > >> >> So how many entries would your array have once I start the 32,000th > > >> >> domain (having at any one time at most a single one running, besides > > >> >> Dom0)? > > >> >> > > >> > In such case, we have to keep a 32K array because the domain_id is the > > > index to > > >> > access the array. But this array is per socket so the whole memory used > > > should > > >> > not be too much. > > >> > > >> We carefully avoid any runtime allocations of order > 0, so if you > > >> were to set up such an array, you'd need to use vmalloc()/vzalloc(). > > >> But I continue to be unconvinced that we want such a large array > > >> in the first place. > > >> > > >> > After considering this issue more, I think the original codes might not be > > >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most > > >> > 288 CPUs. So, I think the domains running at same time in reality may not > > > be > > >> > so many (no efficient resources). If this hypothesis is right, a loop to > > > write > > >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am > > >> > wrong, please correct me. Thanks! > > >> > > >> What relationship does the number of CPUs have to the number of > > >> domains on a host? There could be thousands with just a few dozen > > >> CPUs, provided none or very few of them have high demands on > > >> CPU resources. Additionally please never forget that system sizes > > >> basically only ever grow. Plus we wouldn't want a latent issue here > > >> in case we ever end up needing to widen domain IDs beyond 16 bits. > > >> > > > How about a per socket array like this: > > > uint32_t domain_switch[1024]; > > > > > > Every bit represents a domain id. Then, we can handle this case as below: > > > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to > > > cover socket offline case. We do not need to clear it in > > > 'free_socket_resources'. > > > > > > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set > > > the bit to 1 according to domain_id. If the old value is 0 and the > > > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. > > > > > > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit > > > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. > > > > > > Then, we only use 4KB for one socket. > > > > This looks to come closer to something I'd consider acceptable, but > > I may not understand your intentions in full yet: For one, there's > > nowhere you clear the bit (other than presumably during socket > > cleanup). > > Actually, clear the array in 'free_socket_resources' has same effect. I can > move clear action into it. > > > And then I don't understand the test_and_ parts of the > > constructs above, i.e. you don't clarify what the return values > > would be used/needed for. > > > Sorry, 0 means this domain has not been scheduled to the socket yet. If > test_and_ returns 0, that is the first time the domain runs on the socket > (the first time the socket is online). So, we need restore 'psr_cos_ids[socket]' ^ missed 'after'. I mean the first time the domain is scheduled to the socket after the socket is online. > to 0 in 'psr_ctxt_switch_to()'. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote: > On 17-04-13 04:58:06, Jan Beulich wrote: >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: >> > How about a per socket array like this: >> > uint32_t domain_switch[1024]; >> > >> > Every bit represents a domain id. Then, we can handle this case as below: >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to >> > cover socket offline case. We do not need to clear it in >> > 'free_socket_resources'. >> > >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set >> > the bit to 1 according to domain_id. If the old value is 0 and the >> > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. >> > >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit >> > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. >> > >> > Then, we only use 4KB for one socket. >> >> This looks to come closer to something I'd consider acceptable, but >> I may not understand your intentions in full yet: For one, there's >> nowhere you clear the bit (other than presumably during socket >> cleanup). > > Actually, clear the array in 'free_socket_resources' has same effect. I can > move clear action into it. That wasn't my point - I was asking about clearing individual bits. Point being that if you only ever set bits in the map, you'll likely end up iterating through all active domains anyway. Jan
On 17-04-13 05:31:41, Jan Beulich wrote: > >>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-13 04:58:06, Jan Beulich wrote: > >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: > >> > How about a per socket array like this: > >> > uint32_t domain_switch[1024]; > >> > > >> > Every bit represents a domain id. Then, we can handle this case as below: > >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to > >> > cover socket offline case. We do not need to clear it in > >> > 'free_socket_resources'. > >> > > >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set > >> > the bit to 1 according to domain_id. If the old value is 0 and the > >> > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. > >> > > >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit > >> > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. > >> > > >> > Then, we only use 4KB for one socket. > >> > >> This looks to come closer to something I'd consider acceptable, but > >> I may not understand your intentions in full yet: For one, there's > >> nowhere you clear the bit (other than presumably during socket > >> cleanup). > > > > Actually, clear the array in 'free_socket_resources' has same effect. I can > > move clear action into it. > > That wasn't my point - I was asking about clearing individual bits. > Point being that if you only ever set bits in the map, you'll likely > end up iterating through all active domains anyway. > If entering 'free_socket_resources', that means no more actions to the array on this socket except clearing it. Can I just memset this array of the socekt to 0? > Jan
>>> On 13.04.17 at 13:44, <yi.y.sun@linux.intel.com> wrote: > On 17-04-13 05:31:41, Jan Beulich wrote: >> >>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-13 04:58:06, Jan Beulich wrote: >> >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote: >> >> > How about a per socket array like this: >> >> > uint32_t domain_switch[1024]; >> >> > >> >> > Every bit represents a domain id. Then, we can handle this case as below: >> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to >> >> > cover socket offline case. We do not need to clear it in >> >> > 'free_socket_resources'. >> >> > >> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set >> >> > the bit to 1 according to domain_id. If the old value is 0 and the >> >> > 'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0. >> >> > >> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit >> >> > to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow. >> >> > >> >> > Then, we only use 4KB for one socket. >> >> >> >> This looks to come closer to something I'd consider acceptable, but >> >> I may not understand your intentions in full yet: For one, there's >> >> nowhere you clear the bit (other than presumably during socket >> >> cleanup). >> > >> > Actually, clear the array in 'free_socket_resources' has same effect. I can >> > move clear action into it. >> >> That wasn't my point - I was asking about clearing individual bits. >> Point being that if you only ever set bits in the map, you'll likely >> end up iterating through all active domains anyway. >> > If entering 'free_socket_resources', that means no more actions to > the array on this socket except clearing it. Can I just memset this array > of the socekt to 0? You can, afaict, but unless first you act on the set bits I can't see why you would want to track the bits in the first place. Or maybe I'm still not understanding your intention, in which case I guess the best you can do is simply implement your plan, and we'll discuss it in v11 review. Jan
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index dc213a7..6ed71e2 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1437,21 +1437,33 @@ long arch_do_domctl( switch ( domctl->u.psr_cat_op.cmd ) { case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: - ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target, - domctl->u.psr_cat_op.data, - PSR_CBM_TYPE_L3); + if ( domctl->u.psr_cat_op.data != + (uint32_t)domctl->u.psr_cat_op.data ) + return -EINVAL; + + ret = psr_set_val(d, domctl->u.psr_cat_op.target, + domctl->u.psr_cat_op.data, + PSR_CBM_TYPE_L3); break; case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE: - ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target, - domctl->u.psr_cat_op.data, - PSR_CBM_TYPE_L3_CODE); + if ( domctl->u.psr_cat_op.data != + (uint32_t)domctl->u.psr_cat_op.data ) + return -EINVAL; + + ret = psr_set_val(d, domctl->u.psr_cat_op.target, + domctl->u.psr_cat_op.data, + PSR_CBM_TYPE_L3_CODE); break; case XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA: - ret = psr_set_l3_cbm(d, domctl->u.psr_cat_op.target, - domctl->u.psr_cat_op.data, - PSR_CBM_TYPE_L3_DATA); + if ( domctl->u.psr_cat_op.data != + (uint32_t)domctl->u.psr_cat_op.data ) + return -EINVAL; + + ret = psr_set_val(d, domctl->u.psr_cat_op.target, + domctl->u.psr_cat_op.data, + PSR_CBM_TYPE_L3_DATA); break; case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 25fcd21..9d805d6 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket) { unsigned int i; struct psr_socket_info *info = socket_info + socket; + struct domain *d; if ( !info ) return; + /* Restore domain cos id to 0 when socket is offline. */ + for_each_domain ( d ) + { + unsigned int cos = d->arch.psr_cos_ids[socket]; + if ( cos == 0 ) + continue; + + spin_lock(&info->ref_lock); + ASSERT(!cos || info->cos_ref[cos]); + info->cos_ref[cos]--; + spin_unlock(&info->ref_lock); + + d->arch.psr_cos_ids[socket] = 0; + } + /* * Free resources of features. The global feature object, e.g. feat_l3_cat, * may not be freed here if it is not added into array. It is simply being @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket, return 0; } -int psr_set_l3_cbm(struct domain *d, unsigned int socket, - uint64_t cbm, enum cbm_type type) +/* Set value functions */ +static unsigned int get_cos_num(const struct psr_socket_info *info) { return 0; } +static int gather_val_array(uint32_t val[], + unsigned int array_len, + const struct psr_socket_info *info, + unsigned int old_cos) +{ + return -EINVAL; +} + +static int insert_val_to_array(uint32_t val[], + unsigned int array_len, + const struct psr_socket_info *info, + enum psr_feat_type feat_type, + enum cbm_type type, + uint32_t new_val) +{ + return -EINVAL; +} + +static int find_cos(const uint32_t val[], unsigned int array_len, + enum psr_feat_type feat_type, + const struct psr_socket_info *info, + spinlock_t *ref_lock) +{ + ASSERT(spin_is_locked(ref_lock)); + + return -ENOENT; +} + +static int pick_avail_cos(const struct psr_socket_info *info, + spinlock_t *ref_lock, + const uint32_t val[], unsigned int array_len, + unsigned int old_cos, + enum psr_feat_type feat_type) +{ + ASSERT(spin_is_locked(ref_lock)); + + return -ENOENT; +} + +static int write_psr_msr(unsigned int socket, unsigned int cos, + uint32_t val, enum psr_feat_type feat_type) +{ + return -ENOENT; +} + +/* The whole set process is protected by domctl_lock. */ +int psr_set_val(struct domain *d, unsigned int socket, + uint32_t val, enum cbm_type type) +{ + unsigned int old_cos; + int cos, ret; + unsigned int *ref; + uint32_t *val_array; + struct psr_socket_info *info = get_socket_info(socket); + unsigned int array_len; + enum psr_feat_type feat_type; + + if ( IS_ERR(info) ) + return PTR_ERR(info); + + feat_type = psr_cbm_type_to_feat_type(type); + if ( feat_type > ARRAY_SIZE(info->features) || + !info->features[feat_type] ) + return -ENOENT; + + /* + * Step 0: + * old_cos means the COS ID current domain is using. By default, it is 0. + * + * For every COS ID, there is a reference count to record how many domains + * are using the COS register corresponding to this COS ID. + * - If ref[old_cos] is 0, that means this COS is not used by any domain. + * - If ref[old_cos] is 1, that means this COS is only used by current + * domain. + * - If ref[old_cos] is more than 1, that mean multiple domains are using + * this COS. + */ + old_cos = d->arch.psr_cos_ids[socket]; + ASSERT(old_cos < MAX_COS_REG_CNT); + + ref = info->cos_ref; + + /* + * Step 1: + * Gather a value array to store all features cos_reg_val[old_cos]. + * And, set the input new val into array according to the feature's + * position in array. + */ + array_len = get_cos_num(info); + val_array = xzalloc_array(uint32_t, array_len); + if ( !val_array ) + return -ENOMEM; + + if ( (ret = gather_val_array(val_array, array_len, info, old_cos)) != 0 ) + goto free_array; + + if ( (ret = insert_val_to_array(val_array, array_len, info, + feat_type, type, val)) != 0 ) + goto free_array; + + spin_lock(&info->ref_lock); + + /* + * Step 2: + * Try to find if there is already a COS ID on which all features' values + * are same as the array. Then, we can reuse this COS ID. + */ + cos = find_cos(val_array, array_len, feat_type, info, &info->ref_lock); + if ( cos == old_cos ) + { + ret = 0; + goto unlock_free_array; + } + + /* + * Step 3: + * If fail to find, we need pick an available COS ID. + * In fact, only COS ID which ref is 1 or 0 can be picked for current + * domain. If old_cos is not 0 and its ref==1, that means only current + * domain is using this old_cos ID. So, this old_cos ID certainly can + * be reused by current domain. Ref==0 means there is no any domain + * using this COS ID. So it can be used for current domain too. + */ + if ( cos < 0 ) + { + cos = pick_avail_cos(info, &info->ref_lock, val_array, + array_len, old_cos, feat_type); + if ( cos < 0 ) + { + ret = cos; + goto unlock_free_array; + } + + /* + * Step 4: + * Write all features MSRs according to the COS ID. + */ + ret = write_psr_msr(socket, cos, val, feat_type); + if ( ret ) + goto unlock_free_array; + } + + /* + * Step 5: + * Find the COS ID (find_cos result is '>= 0' or an available COS ID is + * picked, then update ref according to COS ID. + */ + ref[cos]++; + ASSERT(!cos || ref[cos]); + ASSERT(!old_cos || ref[old_cos]); + ref[old_cos]--; + spin_unlock(&info->ref_lock); + + /* + * Step 6: + * Save the COS ID into current domain's psr_cos_ids[] so that we can know + * which COS the domain is using on the socket. One domain can only use + * one COS ID at same time on each socket. + */ + d->arch.psr_cos_ids[socket] = cos; + + xfree(val_array); + return ret; + + unlock_free_array: + spin_unlock(&info->ref_lock); + free_array: + xfree(val_array); + return ret; +} + /* Called with domain lock held, no extra lock needed for 'psr_cos_ids' */ static void psr_free_cos(struct domain *d) { + unsigned int socket, cos; + + ASSERT(socket_info); + + if ( !d->arch.psr_cos_ids ) + return; + + /* Domain is destroied so its cos_ref should be decreased. */ + for ( socket = 0; socket < nr_sockets; socket++ ) + { + struct psr_socket_info *info; + + /* cos 0 is default one which does not need be handled. */ + cos = d->arch.psr_cos_ids[socket]; + if ( cos == 0 ) + continue; + + info = socket_info + socket; + spin_lock(&info->ref_lock); + ASSERT(!cos || info->cos_ref[cos]); + info->cos_ref[cos]--; + spin_unlock(&info->ref_lock); + } + xfree(d->arch.psr_cos_ids); d->arch.psr_cos_ids = NULL; } diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h index 7c6d38a..66d5218 100644 --- a/xen/include/asm-x86/psr.h +++ b/xen/include/asm-x86/psr.h @@ -73,8 +73,8 @@ int psr_get_info(unsigned int socket, enum cbm_type type, uint32_t data[], unsigned int array_len); int psr_get_val(struct domain *d, unsigned int socket, uint32_t *val, enum cbm_type type); -int psr_set_l3_cbm(struct domain *d, unsigned int socket, - uint64_t cbm, enum cbm_type type); +int psr_set_val(struct domain *d, unsigned int socket, + uint32_t val, enum cbm_type type); int psr_domain_init(struct domain *d); void psr_domain_free(struct domain *d);
As set value flow is the most complicated one in psr, it will be divided to some patches to make things clearer. This patch implements the set value framework to show a whole picture firstly. It also changes domctl interface to make it more general. To make the set value flow be general and can support multiple features at same time, it includes below steps: 1. Get COS ID that current domain is using. 2. Gather a value array to store all features current value into it and replace the current value of the feature which is being set to the new input value. 3. Find if there is already a COS ID on which all features' values are same as the array. Then, we can reuse this COS ID. 4. If fail to find, we need pick an available COS ID. Only COS ID which ref is 0 or 1 can be picked. 5. Write the feature's MSR according to the COS ID and cbm_type. 6. Update ref according to COS ID. 7. Save the COS ID into current domain's psr_cos_ids[socket] so that we can know which COS the domain is using on the socket. So, some functions are abstracted and the callback functions will be implemented in next patches. Here is an example to understand the process. The CPU supports two featuers, e.g. L3 CAT and L2 CAT. User wants to set L3 CAT of Dom1 to 0x1ff. 1. At the initial time, the old_cos of Dom1 is 0. The COS registers values are below at this time. ------------------------------- | COS 0 | COS 1 | COS 2 | ... | ------------------------------- L3 CAT | 0x7ff | 0x7ff | 0x7ff | ... | ------------------------------- L2 CAT | 0xff | 0xff | 0xff | ... | ------------------------------- 2. Gather the value array and insert new value into it: val[0]: 0x1ff val[1]: 0xff 3. It cannot find a matching COS. 4. Pick COS 1 to store the value set. 5. Write the L3 CAT COS 1 registers. The COS registers values are changed to below now. ------------------------------- | COS 0 | COS 1 | COS 2 | ... | ------------------------------- L3 CAT | 0x7ff | 0x1ff | ... | ... | ------------------------------- L2 CAT | 0xff | 0xff | ... | ... | ------------------------------- 6. The ref[1] is increased to 1 because Dom1 is using it now. 7. Save 1 to Dom1's psr_cos_ids[socket]. Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos of Dom2 is 0 too. Repeat above flow. The val array assembled is: val[0]: 0x1ff val[1]: 0xff So, it can find a matching COS, COS 1. Then, it can reuse COS 1 for Dom2. The ref[1] is increased to 2 now because both Dom1 and Dom2 are using this COS ID. Set 1 to Dom2's psr_cos_ids[socket]. Another thing need to emphasize is the context switch. When context switch happens, 'psr_ctxt_switch_to' is called by system to get domain's COS ID from 'psr_cos_ids[socket]'. But 'psr_cos_ids[socket]' is set at step 7 above. So, there are three scenarios, e.g.: 1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is called which makes COS ID 1 work. For this case, no action is needed. 2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work. 3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()' is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'. The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max. So even the COS ID got here is wrong, it is still a workable ID (within cos_max). The functionality is still workable, the actual valid CBM will be effective for a short time. In next schedule, the correct CBM will take effect. All these cases will not cause race condition and no harm to system. The PSR features are to set cache capacity for a domain. The setting to cache is progressively effective. When the cache setting becomes really effective, the time slice to schedule a domain may have passed. So, even if a wrong COS ID is used to set ASSOC, only another valid CBM be effective for a short time during cache preparation time. The correct COS ID will take effect in a short time. This does not affect cache capacity setting much. Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> --- v10: - restore domain cos id to 0 when socket is offline. (suggested by Jan Beulich) - check 'psr_cat_op.data' to make sure only lower 32 bits are valid. (suggested by Jan Beulich) - remove unnecessary fixed width type of parameters and variables. (suggested by Jan Beulich) - rename 'insert_new_val_to_array' to 'insert_val_to_array'. (suggested by Jan Beulich) - input 'ref_lock' pointer into functions to check if it has been locked. (suggested by Jan Beulich) - add comment to declare the set process is protected by 'domctl_lock'. (suggested by Jan Beulich) - check 'feat_type'. (suggested by Jan Beulich) - remove 'feat_mask'. (suggested by Jan Beulich) - remove unnecessary criteria of ASSERT. (suggested by Jan Beulich) - adjust flow of 'psr_set_val' to avoid 'goto' for successful cases. (suggested by Jan Beulich) - use ASSERT to check 'socket_info' in 'psr_free_cos'. (suggested by Jan Beulich) - remove unnecessary comment in 'psr_free_cos'. (suggested by Jan Beulich) v9: - use goto style error handling in 'psr_set_val'. (suggested by Wei Liu) - use ASSERT for checking old_cos. (suggested by Wei Liu and Jan Beulich) - fix coding style issue. (suggested by Wei Liu) - rename 'assemble_val_array' to 'combine_val_array' in pervious patch. (suggested by Wei Liu) - use 'spin_is_locked' to check ref_lock. (suggested by Roger Pau) - add an input parameter 'array_len' for 'write_psr_msr'. - check 'socket_info' and 'psr_cos_ids' in this patch. (suggested by Jan Beulich) - modify patch title to indicate 'L3 CAT'. (suggested by Jan Beulich) - fix commit message words. (suggested by Jan Beulich) - change 'assemble_val_array' to 'gather_val_array'. (suggested by Jan Beulich) - change 'set_new_val_to_array' to 'insert_new_val_to_array'. (suggested by Jan Beulich) - change parameter 'm' of 'insert_new_val_to_array' to 'new_val'. (suggested by Jan Beulich) - change 'write_psr_msr' to 'write_psr_msrs'. (suggested by Jan Beulich) - correct comments. (suggested by Jan Beulich) - remove unnecessary comments. (suggested by Jan Beulich) - adjust conditions after 'find_cos' to save a level of indentation. (suggested by Jan Beulich) - add 'ASSERT(!old_cos || ref[old_cos])'. (suggested by Jan Beulich) - move ASSERT() check into locked region. (suggested by Jan Beulich) - replace parameter '*val' to 'val[]' in some functions. (suggested by Jan Beulich) - change 'write_psr_msr' parameters to prepare to only set one new value for one feature. (suggested by Jan Beulich) - changes about 'uint64_t' to 'uint32_t'. (suggested by Jan Beulich) - add explanation about context switch. (suggested by Jan Beulich) v5: - modify commit message. (suggested by Jan Beulich) - return an error for all helper functions in set flow. (suggested by Jan Beulich) - remove unnecessary cast. (suggested by Jan Beulich) - divide 'get_old_set_new' to two functions, 'assemble_val_array' and 'set_new_val_to_array'. (suggested by Jan Beulich) - modify comments. (suggested by Jan Beulich) - adjust code format. (suggested by Jan Beulich) - change 'alloc_new_cos' to 'pick_avail_cos' to make name accurate. (suggested by Jan Beulich) - check feature type when entering 'psr_set_val'. (suggested by Jan Beulich) - use ASSERT to check ref. (suggested by Jan Beulich) - rename 'dat[]' to 'data[]'. (suggested by Jan Beulich) v4: - create this patch to make codes easier to understand. (suggested by Jan Beulich) --- xen/arch/x86/domctl.c | 30 +++++-- xen/arch/x86/psr.c | 215 +++++++++++++++++++++++++++++++++++++++++++++- xen/include/asm-x86/psr.h | 4 +- 3 files changed, 236 insertions(+), 13 deletions(-)