diff mbox

[v10,09/25] x86: refactor psr: L3 CAT: set value: implement framework.

Message ID 20170418105533.GK17458@yi.y.sun (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun April 18, 2017, 10:55 a.m. UTC
On 17-04-13 05:50:01, Jan Beulich wrote:
> >>> On 13.04.17 at 13:44, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-13 05:31:41, Jan Beulich wrote:
> >> >>> On 13.04.17 at 13:11, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-04-13 04:58:06, Jan Beulich wrote:
> >> >> >>> On 13.04.17 at 12:49, <yi.y.sun@linux.intel.com> wrote:
> >> >> > How about a per socket array like this:
> >> >> > uint32_t domain_switch[1024];
> >> >> > 
> >> >> > Every bit represents a domain id. Then, we can handle this case as below:
> >> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
> >> >> >    cover socket offline case. We do not need to clear it in 
> >> >> > 'free_socket_resources'.
> >> >> > 
> >> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
> >> >> >    the bit to 1 according to domain_id. If the old value is 0 and the 
> >> >> >    'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> >> >> > 
> >> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
> >> >> >    to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> >> >> > 
> >> >> > Then, we only use 4KB for one socket.
> >> >> 
> >> >> This looks to come closer to something I'd consider acceptable, but
> >> >> I may not understand your intentions in full yet: For one, there's
> >> >> nowhere you clear the bit (other than presumably during socket
> >> >> cleanup). 
> >> > 
> >> > Actually, clear the array in 'free_socket_resources' has same effect. I can
> >> > move clear action into it.
> >> 
> >> That wasn't my point - I was asking about clearing individual bits.
> >> Point being that if you only ever set bits in the map, you'll likely
> >> end up iterating through all active domains anyway.
> >> 
> > If entering 'free_socket_resources', that means no more actions to
> > the array on this socket except clearing it. Can I just memset this array
> > of the socekt to 0?
> 
> You can, afaict, but unless first you act on the set bits I can't see why
> you would want to track the bits in the first place. Or maybe I'm still
> not understanding your intention, in which case I guess the best you
> can do is simply implement your plan, and we'll discuss it in v11 review.
> 
I made a test patch based on v10 and attached it in mail. Could you please
help to check it? Thanks!

> Jan

Comments

Jan Beulich April 18, 2017, 11:46 a.m. UTC | #1
>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
> I made a test patch based on v10 and attached it in mail. Could you please
> help to check it? Thanks!

This looks reasonable at the first glance, albeit I continue to be
unconvinced that this is the only (reasonable) way of solving the
problem. After all we don't have to go through similar hoops for
any other of the register state associated with a vCPU. There
are a number of cosmetic issues, but commenting on an attached
(rather than inlined) patch is a little difficult.

One remark regarding the locking though: Acquiring a lock in the
context switch path should be made as low risk of long stalls as
possible. Therefore you will want to consider using r/w locks
instead of spin locks here, which would allow parallelism on all
cores of a socket as long as COS IDs aren't being updated.

Jan
Yi Sun April 19, 2017, 8:22 a.m. UTC | #2
On 17-04-18 05:46:43, Jan Beulich wrote:
> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
> > I made a test patch based on v10 and attached it in mail. Could you please
> > help to check it? Thanks!
> 
> This looks reasonable at the first glance, albeit I continue to be
> unconvinced that this is the only (reasonable) way of solving the
> problem. After all we don't have to go through similar hoops for
> any other of the register state associated with a vCPU. There

Sorry, I do not understand your meaning clearly.
1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
   we check 'dom_ids' array to know if the domain's cos id has not been set but
   its 'psr_cos_ids[socket]' already has a non zero value. This case means the
   socket offline has happened so that we have to restore the domain's
   'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
   I think we have discussed this in previous mails and achieved agreement on
   such logic. 
2. DYM the COS MSRs? We have discussed this before. Per your comments, when
   socket is online, the registers values may be modified by FW and others.
   So, we have to restore them to default value which is being done in
   'cat_init_feature'.

So, what is your exactly meaning here? Thanks!

> are a number of cosmetic issues, but commenting on an attached
> (rather than inlined) patch is a little difficult.
> 
Sorry for that, I will directly send patch out next time.

> One remark regarding the locking though: Acquiring a lock in the
> context switch path should be made as low risk of long stalls as
> possible. Therefore you will want to consider using r/w locks
> instead of spin locks here, which would allow parallelism on all
> cores of a socket as long as COS IDs aren't being updated.
> 
In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
So, I do not understand why read-write lock is better? Anything I don't
consider? Please help to point out. Thanks!

> Jan
Jan Beulich April 19, 2017, 9 a.m. UTC | #3
>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-18 05:46:43, Jan Beulich wrote:
>> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>> > I made a test patch based on v10 and attached it in mail. Could you please
>> > help to check it? Thanks!
>> 
>> This looks reasonable at the first glance, albeit I continue to be
>> unconvinced that this is the only (reasonable) way of solving the
>> problem. After all we don't have to go through similar hoops for
>> any other of the register state associated with a vCPU. There
> 
> Sorry, I do not understand your meaning clearly.
> 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
>    we check 'dom_ids' array to know if the domain's cos id has not been set but
>    its 'psr_cos_ids[socket]' already has a non zero value. This case means the
>    socket offline has happened so that we have to restore the domain's
>    'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
>    I think we have discussed this in previous mails and achieved agreement on
>    such logic. 
> 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>    socket is online, the registers values may be modified by FW and others.
>    So, we have to restore them to default value which is being done in
>    'cat_init_feature'.
> 
> So, what is your exactly meaning here? Thanks!

Neither of the two. I'm comparing with COS/PSR-_unrelated_
handling of register state. After all there are other MSRs which
we need to put into the right state after a core comes online.

>> are a number of cosmetic issues, but commenting on an attached
>> (rather than inlined) patch is a little difficult.
>> 
> Sorry for that, I will directly send patch out next time.
> 
>> One remark regarding the locking though: Acquiring a lock in the
>> context switch path should be made as low risk of long stalls as
>> possible. Therefore you will want to consider using r/w locks
>> instead of spin locks here, which would allow parallelism on all
>> cores of a socket as long as COS IDs aren't being updated.
>> 
> In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
> So, I do not understand why read-write lock is better? Anything I don't
> consider? Please help to point out. Thanks!

Hmm, looking again I can see that r/w locks indeed may not help
here. However, what you say still doesn't look correct to me: You
acquire the lock depending on _only_ psra->cos_mask being non-
zero. This means that all cores on one socket are being serialized
here. Quite possibly all you need is for some of the checks done
inside the locked region to be replicated (but _not_ moved) to the
outer if(), to limit the number of times where the lock is to be
acquired.

Jan
Yi Sun April 20, 2017, 2:14 a.m. UTC | #4
On 17-04-19 03:00:29, Jan Beulich wrote:
> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-18 05:46:43, Jan Beulich wrote:
> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
> >> > I made a test patch based on v10 and attached it in mail. Could you please
> >> > help to check it? Thanks!
> >> 
> >> This looks reasonable at the first glance, albeit I continue to be
> >> unconvinced that this is the only (reasonable) way of solving the

Do you have any other suggestion on this? Thanks!

> >> problem. After all we don't have to go through similar hoops for
> >> any other of the register state associated with a vCPU. There
> > 
> > Sorry, I do not understand your meaning clearly.
> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
> >    we check 'dom_ids' array to know if the domain's cos id has not been set but
> >    its 'psr_cos_ids[socket]' already has a non zero value. This case means the
> >    socket offline has happened so that we have to restore the domain's
> >    'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
> >    I think we have discussed this in previous mails and achieved agreement on
> >    such logic. 
> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
> >    socket is online, the registers values may be modified by FW and others.
> >    So, we have to restore them to default value which is being done in
> >    'cat_init_feature'.
> > 
> > So, what is your exactly meaning here? Thanks!
> 
> Neither of the two. I'm comparing with COS/PSR-_unrelated_
> handling of register state. After all there are other MSRs which
> we need to put into the right state after a core comes online.
> 
For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
is offline. The values in it are all 0 when socket is online again. This causes
error when user shows the CBMs. So, we have two options when socket is online:
1. Read COS MSRs values and save them into 'cos_reg_val[]'.
2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.

Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
what we discussed. FYR.

[v9 patch 05]
>> >> > +    /* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */
>> >> > +    feat->cos_reg_val[0] = cat_default_val(cat.cbm_len);
>> >> > +    /*
>> >> > +     * To handle cpu offline and then online case, we need read MSRs back to
>> >> > +     * save values into cos_reg_val array.
>> >> > +     */
>> >> > +    for ( i = 1; i <= cat.cos_max; i++ )
>> >> > +    {
>> >> > +        rdmsrl(MSR_IA32_PSR_L3_MASK(i), val);
>> >> > +        feat->cos_reg_val[i] = (uint32_t)val;
>> >> > +    }
>> >> 
[Jan]
>> >> You mention this in the changes done, but I don't understand why 
>> >> you do this. What meaning to these values have to you? If you want 
>> >> hardware and cached values to match up, the much more conventional 
>> >> way of enforcing this would be to write the values you actually 
>> >> want (normally all zero).
>> >>
[Sun Yi] 
>> > When all cpus on a socket are offline, the free_feature() is called 
>> > to free features resources so that the values saved in 
>> > cos_reg_val[] are lost. When the socket is online again, features 
>> > are allocated again so that cos_reg_val[] members are all 
>> > initialized to 0. Only is cos_reg_val[0] initialized to default value in this function in old codes.
>> > 
>> > But domain is still alive so that its cos id on the socket is kept. 
>> > The corresponding MSR value is kept too per test. To make 
>> > cos_reg_val[] values be same as HW to not to mislead user, we 
>> > should read back the valid values on HW into cos_reg_val[].
>> 
[Jan]
>> Okay, I understand the background, but I don't view this solution as 
>> viable: Once the last core on a socket goes offline, all references 
>> to it should be cleaned up. After all what will be brought back 
>> online may be a different physical CPU altogether; you can't assume 
>> MSR values to have survived even if it is the same CPU which comes 
>> back online, as it may have undergone a reset cycle, or BIOS/SMM may 
>> have played with the MSRs.
>> That's even a possibility for a single core coming back online, so 
>> you have to reload MSRs explicitly anyway if implicit reloading (i.e. 
>> once vCPU-s get scheduled onto it) doesn't suffice.
>> 
[Sun Yi]
> So, you think the MSRs values may not be valid after such process and 
> reloading (write MSRs to default value) is needed. If so, I would like 
> to do more operations in 'free_feature()':
> 1. Iterate all domains working on the offline socket to change
>    'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>    status.
> 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> 
> These can make the socket's info be totally restored back to init status.
[Jan]
Yes, that's what I think is needed.

> >> are a number of cosmetic issues, but commenting on an attached
> >> (rather than inlined) patch is a little difficult.
> >> 
> > Sorry for that, I will directly send patch out next time.
> > 
> >> One remark regarding the locking though: Acquiring a lock in the
> >> context switch path should be made as low risk of long stalls as
> >> possible. Therefore you will want to consider using r/w locks
> >> instead of spin locks here, which would allow parallelism on all
> >> cores of a socket as long as COS IDs aren't being updated.
> >> 
> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
> > So, I do not understand why read-write lock is better? Anything I don't
> > consider? Please help to point out. Thanks!
> 
> Hmm, looking again I can see that r/w locks indeed may not help
> here. However, what you say still doesn't look correct to me: You
> acquire the lock depending on _only_ psra->cos_mask being non-
> zero. This means that all cores on one socket are being serialized
> here. Quite possibly all you need is for some of the checks done
> inside the locked region to be replicated (but _not_ moved) to the
> outer if(), to limit the number of times where the lock is to be
> acquired.
> 
I think your suggestions is to check old_cos outer the lock region.
If it is not 0 which means the domain's cos id does not need restore
to 0, we can directly set it into ASSOC register. That can avoid
unnecessary lock. I will send out the test patch again to ask your
help to provide review comments (you said there are also 'a number
of cosmetics issues'). Thanks!

> Jan
Jan Beulich April 20, 2017, 9:43 a.m. UTC | #5
>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-19 03:00:29, Jan Beulich wrote:
>> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-18 05:46:43, Jan Beulich wrote:
>> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>> >> > I made a test patch based on v10 and attached it in mail. Could you please
>> >> > help to check it? Thanks!
>> >> 
>> >> This looks reasonable at the first glance, albeit I continue to be
>> >> unconvinced that this is the only (reasonable) way of solving the
> 
> Do you have any other suggestion on this? Thanks!

I'm sorry, but no. I'm already spending _much_ more time on this
series than I should be, considering its (afaic) relatively low
priority. I really have to ask you to properly think through both
the data layout and code structure, including considering
alternatives especially in places where you can _anticipate_
controversy.

>> >> problem. After all we don't have to go through similar hoops for
>> >> any other of the register state associated with a vCPU. There
>> > 
>> > Sorry, I do not understand your meaning clearly.
>> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
>> >    we check 'dom_ids' array to know if the domain's cos id has not been set but
>> >    its 'psr_cos_ids[socket]' already has a non zero value. This case means the
>> >    socket offline has happened so that we have to restore the domain's
>> >    'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
>> >    I think we have discussed this in previous mails and achieved agreement on
>> >    such logic. 
>> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>> >    socket is online, the registers values may be modified by FW and others.
>> >    So, we have to restore them to default value which is being done in
>> >    'cat_init_feature'.
>> > 
>> > So, what is your exactly meaning here? Thanks!
>> 
>> Neither of the two. I'm comparing with COS/PSR-_unrelated_
>> handling of register state. After all there are other MSRs which
>> we need to put into the right state after a core comes online.
>> 
> For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> is offline. The values in it are all 0 when socket is online again. This causes
> error when user shows the CBMs. So, we have two options when socket is 
> online:
> 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.

This re-states what you want to do; it does not answer my question.
Along the lines of what you say, for example FS and GS base MSRs
come back as zero too after a socket has been (re-)onlined. We
don't need to go through any hoops there, nevertheless.

> Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
> what we discussed. FYR.

Well, we decided option 2 is better than option 1. I'm still
unconvinced there's not a yet better alternative.

>> >> are a number of cosmetic issues, but commenting on an attached
>> >> (rather than inlined) patch is a little difficult.
>> >> 
>> > Sorry for that, I will directly send patch out next time.
>> > 
>> >> One remark regarding the locking though: Acquiring a lock in the
>> >> context switch path should be made as low risk of long stalls as
>> >> possible. Therefore you will want to consider using r/w locks
>> >> instead of spin locks here, which would allow parallelism on all
>> >> cores of a socket as long as COS IDs aren't being updated.
>> >> 
>> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
>> > So, I do not understand why read-write lock is better? Anything I don't
>> > consider? Please help to point out. Thanks!
>> 
>> Hmm, looking again I can see that r/w locks indeed may not help
>> here. However, what you say still doesn't look correct to me: You
>> acquire the lock depending on _only_ psra->cos_mask being non-
>> zero. This means that all cores on one socket are being serialized
>> here. Quite possibly all you need is for some of the checks done
>> inside the locked region to be replicated (but _not_ moved) to the
>> outer if(), to limit the number of times where the lock is to be
>> acquired.
>> 
> I think your suggestions is to check old_cos outer the lock region.

My suggestion is to check as much state as possible, to prevent
having to acquire the lock whenever possible.

> If it is not 0 which means the domain's cos id does not need restore
> to 0, we can directly set it into ASSOC register. That can avoid
> unnecessary lock. I will send out the test patch again to ask your
> help to provide review comments (you said there are also 'a number
> of cosmetics issues').

And I would hope you would try to eliminate some (if not all) yourself
first. After all you can easily go over your patch yourself, checking
for e.g. style violations.

Jan
Lars Kurth April 20, 2017, 1:02 p.m. UTC | #6
Apologies for stepping in here. But it feels to me that this thread is at risk of becoming unproductive.

> On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote:
> 
>>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
>> On 17-04-19 03:00:29, Jan Beulich wrote:
>>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
>>>> On 17-04-18 05:46:43, Jan Beulich wrote:
>>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>>>>>> I made a test patch based on v10 and attached it in mail. Could you please
>>>>>> help to check it? Thanks!
>>>>> 
>>>>> This looks reasonable at the first glance, albeit I continue to be
>>>>> unconvinced that this is the only (reasonable) way of solving the
>> 
>> Do you have any other suggestion on this? Thanks!
> 
> I'm sorry, but no. I'm already spending _much_ more time on this
> series than I should be, considering its (afaic) relatively low
> priority. I really have to ask you to properly think through both
> the data layout and code structure, including considering
> alternatives especially in places where you can _anticipate_
> controversy.

Jan, can you confirm whether you are happy with the current proposal. You say "this looks reasonable at the first glance", but then go on to say that there may be other "reasonable ways" of solving the problem at hand.

This is not very concrete and hard to respond to from a contributors point of view: it would be good to establish whether a) the v10 proposal in this area is good enough, b) whether there are any concrete improvements to be made.

You say please think through whether "you can _anticipate_ controversy", but at the same time you can't currently identify/think of any issues. That seems to suggest to me that maybe the proposal is good enough. Or that something is missing, which has not been articulated. Taking into account language barriers, more clarity would probably be helpful.

>> Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
>> what we discussed. FYR.
> 
> Well, we decided option 2 is better than option 1. I'm still
> unconvinced there's not a yet better alternative.

I suppose that is the same type of argument. Aka we looked at a number of options, seem to have agreed one is better than the other. But there is no clarity as to whether in this case option 2 is good enough and what concrete steps would be needed to get to a better alternative.

Of course I may have missed some of the context here in some of the older threads and thus I may have missed some of the context. 

>> If it is not 0 which means the domain's cos id does not need restore
>> to 0, we can directly set it into ASSOC register. That can avoid
>> unnecessary lock. I will send out the test patch again to ask your
>> help to provide review comments (you said there are also 'a number
>> of cosmetics issues').
> 
> And I would hope you would try to eliminate some (if not all) yourself
> first. After all you can easily go over your patch yourself, checking
> for e.g. style violations.

I think this is fair enough.

Best Regards
Lars
Jan Beulich April 20, 2017, 1:21 p.m. UTC | #7
>>> On 20.04.17 at 15:02, <lars.kurth.xen@gmail.com> wrote:
>> On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote:
>> 
>>>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
>>> On 17-04-19 03:00:29, Jan Beulich wrote:
>>>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
>>>>> On 17-04-18 05:46:43, Jan Beulich wrote:
>>>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>>>>>>> I made a test patch based on v10 and attached it in mail. Could you please
>>>>>>> help to check it? Thanks!
>>>>>> 
>>>>>> This looks reasonable at the first glance, albeit I continue to be
>>>>>> unconvinced that this is the only (reasonable) way of solving the
>>> 
>>> Do you have any other suggestion on this? Thanks!
>> 
>> I'm sorry, but no. I'm already spending _much_ more time on this
>> series than I should be, considering its (afaic) relatively low
>> priority. I really have to ask you to properly think through both
>> the data layout and code structure, including considering
>> alternatives especially in places where you can _anticipate_
>> controversy.
> 
> Jan, can you confirm whether you are happy with the current proposal. You 
> say "this looks reasonable at the first glance", but then go on to say that 
> there may be other "reasonable ways" of solving the problem at hand.
> 
> This is not very concrete and hard to respond to from a contributors point 
> of view: it would be good to establish whether a) the v10 proposal in this 
> area is good enough, b) whether there are any concrete improvements to be 
> made.

I understand it's not very concrete, but please understand that with
the over 100 patches wanting looking at right now it is simply
impossible for me to give precise suggestions everywhere. I really
have to be allowed to defer to the originator to come up with
possible better mechanisms (or defend what there is by addressing
questions raised), especially with - as said - the amount of time spent
here already having been way higher than is justifiable. Just go and
compare v10 with one of the initial versions: Almost all of the data
layout and code flow have fundamentally changed, mostly based on
feedback I gave. I'm sorry for saying that, but to me this is an
indication that things hadn't been thought through well in the design
phase, i.e. before even submitting a first RFC.

> You say please think through whether "you can _anticipate_ controversy", but 
> at the same time you can't currently identify/think of any issues. That seems 
> to suggest to me that maybe the proposal is good enough. Or that something is 
> missing, which has not been articulated. Taking into account language 
> barriers, more clarity would probably be helpful.

I've given criteria by which I have the gut feeling (but no more)
that this isn't the right approach. I'm absolutely fine to be
convinced that my gut feeling is wrong. That would require to
simply answer the question I raised multiple times, and which was
repeatedly "answered" by simply re-stating previously expressed
facts.

If this reaction of mine is not acceptable, all I can do is refrain
from further looking at this series. And Yi, I certainly apologize
for perhaps not doing these reviews wholeheartedly, since -
as also expressed before - I continue to not really view this as
very important functionality. Yet considering for how long some
of the versions hadn't been looked at by anyone at all, the
alternative would have been to simply let it sit further without
looking at it. I actually take this lack of interest by others as an
indication that I'm not the only one considering this nice to have,
but no more.

Jan
Lars Kurth April 20, 2017, 4:52 p.m. UTC | #8
> On 20 Apr 2017, at 14:21, Jan Beulich <jbeulich@suse.com> wrote:
> 
>>>> On 20.04.17 at 15:02, <lars.kurth.xen@gmail.com> wrote:
>>> On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote:
>>> 
>>>>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
>>>> On 17-04-19 03:00:29, Jan Beulich wrote:
>>>>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
>>>>>> On 17-04-18 05:46:43, Jan Beulich wrote:
>>>>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>>>>>>>> I made a test patch based on v10 and attached it in mail. Could you please
>>>>>>>> help to check it? Thanks!
>>>>>>> 

[Item 1]

>>>>>>> This looks reasonable at the first glance, albeit I continue to be
>>>>>>> unconvinced that this is the only (reasonable) way of solving the
>>>> 
>>>> Do you have any other suggestion on this? Thanks!
>>> 
>>> I'm sorry, but no. I'm already spending _much_ more time on this
>>> series than I should be, considering its (afaic) relatively low
>>> priority. I really have to ask you to properly think through both
>>> the data layout and code structure, including considering
>>> alternatives especially in places where you can _anticipate_
>>> controversy.
>> 
>> Jan, can you confirm whether you are happy with the current proposal. You 
>> say "this looks reasonable at the first glance", but then go on to say that 
>> there may be other "reasonable ways" of solving the problem at hand.
>> 
>> This is not very concrete and hard to respond to from a contributors point 
>> of view: it would be good to establish whether a) the v10 proposal in this 
>> area is good enough, b) whether there are any concrete improvements to be 
>> made.

> I understand it's not very concrete, but please understand that with
> the over 100 patches wanting looking at right now it is simply
> impossible for me to give precise suggestions everywhere. I really
> have to be allowed to defer to the originator to come up with
> possible better mechanisms (or defend what there is by addressing
> questions raised),

Jan, I don't object to the principle of deferring issues to a contributor, for contributor to defend their viewpoint or to come up with a better mechanism. I just observed, that I could not make a lot of sense what you were looking for in this particular review. I am assuming that it would be even harder for Yi. 

> especially with - as said - the amount of time spent
> here already having been way higher than is justifiable.

And of course we have passed the 4.9 code freeze, so some of the pressure is off. At the same time I understand that because of the upcoming releases you need to focus on bug fixes, etc.

> Just go and
> compare v10 with one of the initial versions: Almost all of the data
> layout and code flow have fundamentally changed, mostly based on
> feedback I gave. I'm sorry for saying that, but to me this is an
> indication that things hadn't been thought through well in the design
> phase, i.e. before even submitting a first RFC.

That is good feedback which may contain some valuable lessons. Once we are through this (or maybe at the summit) it may be worthwhile to look at what has gone wrong and see how we can do better in future.

[Item 2]

>> You say please think through whether "you can _anticipate_ controversy", but 
>> at the same time you can't currently identify/think of any issues. That seems 
>> to suggest to me that maybe the proposal is good enough. Or that something is 
>> missing, which has not been articulated. Taking into account language 
>> barriers, more clarity would probably be helpful.
> 
> I've given criteria by which I have the gut feeling (but no more)
> that this isn't the right approach. I'm absolutely fine to be
> convinced that my gut feeling is wrong. That would require to
> simply answer the question I raised multiple times, and which was
> repeatedly "answered" by simply re-stating previously expressed
> facts.

I have not followed the full thread. But it seems that we have communications issue there. Normally this happens when expectations don't quite match and one party (in this case Yi) does not quite get what the other one is looking for. Maybe the best approach would be for Yi to get some of these things resolved during a short IRC conversation with you. I did see him and others resolve some previous issues more effectively on IRC. 

> If this reaction of mine is not acceptable, all I can do is refrain
> from further looking at this series. And Yi, I certainly apologize
> for perhaps not doing these reviews wholeheartedly, since -
> as also expressed before - I continue to not really view this as
> very important functionality.

As I said earlier, I stepped in, as I didn't really understand what was going on. I think this is a little clearer now. 

So to summarise: 

On item 1: it appears that you are looking for a some more justification why some of the changes were made, maybe with a rationale for some of the choices that were made. Given that this is quite a complex series which has diverged quite a lot from the design, the goal is to make it easier for either you (or someone else) to sanity check the proposal which on the face of things look OK. But you have some doubts and you can't easily check against the design as it is out-of-date.

On item 2: you think something may not be quite right, but you can't really decide until a couple of questions (not quite sure which, but I am sure Yi can locate them) are answered.

Let me know whether this is actually true. 

> Yet considering for how long some
> of the versions hadn't been looked at by anyone at all, the
> alternative would have been to simply let it sit further without
> looking at it. I actually take this lack of interest by others as an
> indication that I'm not the only one considering this nice to have,
> but no more.

That's a good point. There have been some comments from others, but it is also true that 66% of comments on this series did come from you (see https://xen.biterg.io:443/goto/72a753919aa8e3205d03da7b430d2696). On the other hand, this is also a series which is hard to hand over to someone else.

Regards
Lars
Konrad Rzeszutek Wilk April 21, 2017, 1:13 a.m. UTC | #9
> If this reaction of mine is not acceptable, all I can do is refrain
> from further looking at this series. And Yi, I certainly apologize
> for perhaps not doing these reviews wholeheartedly, since -
> as also expressed before - I continue to not really view this as
> very important functionality. Yet considering for how long some
> of the versions hadn't been looked at by anyone at all, the
> alternative would have been to simply let it sit further without
> looking at it. I actually take this lack of interest by others as an
> indication that I'm not the only one considering this nice to have,
> but no more.

I do have an interest in this series. And I can certainly give it
a review - but once you get your teeth in a patchset I feel it is
bit counterintuitive to review it - as you do a much much better
job that I could have.
Jan Beulich April 21, 2017, 6:11 a.m. UTC | #10
>>> On 20.04.17 at 18:52, <lars.kurth.xen@gmail.com> wrote:
> So to summarise: 
> 
> On item 1: it appears that you are looking for a some more justification why 
> some of the changes were made, maybe with a rationale for some of the choices 
> that were made. Given that this is quite a complex series which has diverged 
> quite a lot from the design, the goal is to make it easier for either you (or 
> someone else) to sanity check the proposal which on the face of things look 
> OK. But you have some doubts and you can't easily check against the design as 
> it is out-of-date.
> 
> On item 2: you think something may not be quite right, but you can't really 
> decide until a couple of questions (not quite sure which, but I am sure Yi 
> can locate them) are answered.
> 
> Let me know whether this is actually true. 

Well, afaict both actually boil down to the same single question
regarding the special handling of CAT MSRs after onlining (at
runtime) a core on a socket all of whose cores had been offline,
namely considering that other CPU registers don't require any
such special treatment (in context switch code or elsewhere).

As to the design possibly being out of date - I have to admit I
didn't even check whether the accompanying documentation
has been kept up to date with the actual code changes. The
matter here really isn't with comparing with the design, but
rather whether the design choice (written down or not) was
an appropriate one.

Jan
Jan Beulich April 21, 2017, 6:18 a.m. UTC | #11
>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
> On 17-04-19 03:00:29, Jan Beulich wrote:
>> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
>> > On 17-04-18 05:46:43, Jan Beulich wrote:
>> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
>> >> > I made a test patch based on v10 and attached it in mail. Could you please
>> >> > help to check it? Thanks!
>> >> 
>> >> This looks reasonable at the first glance, albeit I continue to be
>> >> unconvinced that this is the only (reasonable) way of solving the
> 
> Do you have any other suggestion on this? Thanks!
> 
>> >> problem. After all we don't have to go through similar hoops for
>> >> any other of the register state associated with a vCPU. There
>> > 
>> > Sorry, I do not understand your meaning clearly.
>> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
>> >    we check 'dom_ids' array to know if the domain's cos id has not been set but
>> >    its 'psr_cos_ids[socket]' already has a non zero value. This case means the
>> >    socket offline has happened so that we have to restore the domain's
>> >    'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
>> >    I think we have discussed this in previous mails and achieved agreement on
>> >    such logic. 
>> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>> >    socket is online, the registers values may be modified by FW and others.
>> >    So, we have to restore them to default value which is being done in
>> >    'cat_init_feature'.
>> > 
>> > So, what is your exactly meaning here? Thanks!
>> 
>> Neither of the two. I'm comparing with COS/PSR-_unrelated_
>> handling of register state. After all there are other MSRs which
>> we need to put into the right state after a core comes online.
>> 
> For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> is offline. The values in it are all 0 when socket is online again. This causes
> error when user shows the CBMs. So, we have two options when socket is online:
> 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.
> 
> Per our discussion on v9 patch 05, we decided to adopt option 2.

Btw., having thought some more about this, putting code into the
context switch path when there is an alternative is probably the
wrong thing after all, i.e. if special treatment _is_ really needed,
doing it in less frequently executed code would likely be better.
But as before - much depends on clarifying why special treatment
is needed here, but not elsewhere (and to avoid questions, with
"elsewhere" I mean outside of PSR/CAT code - there's plenty of
other CPU register state to take as reference).

Jan
Yi Sun April 24, 2017, 6:40 a.m. UTC | #12
On 17-04-21 00:18:27, Jan Beulich wrote:
> >>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote:
> > On 17-04-19 03:00:29, Jan Beulich wrote:
> >> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote:
> >> > On 17-04-18 05:46:43, Jan Beulich wrote:
> >> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote:
> >> >> > I made a test patch based on v10 and attached it in mail. Could you please
> >> >> > help to check it? Thanks!
> >> >> 
> >> >> This looks reasonable at the first glance, albeit I continue to be
> >> >> unconvinced that this is the only (reasonable) way of solving the
> > 
> > Do you have any other suggestion on this? Thanks!
> > 
> >> >> problem. After all we don't have to go through similar hoops for
> >> >> any other of the register state associated with a vCPU. There
> >> > 
> >> > Sorry, I do not understand your meaning clearly.
> >> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
> >> >    we check 'dom_ids' array to know if the domain's cos id has not been set but
> >> >    its 'psr_cos_ids[socket]' already has a non zero value. This case means the
> >> >    socket offline has happened so that we have to restore the domain's
> >> >    'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
> >> >    I think we have discussed this in previous mails and achieved agreement on
> >> >    such logic. 
> >> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
> >> >    socket is online, the registers values may be modified by FW and others.
> >> >    So, we have to restore them to default value which is being done in
> >> >    'cat_init_feature'.
> >> > 
> >> > So, what is your exactly meaning here? Thanks!
> >> 
> >> Neither of the two. I'm comparing with COS/PSR-_unrelated_
> >> handling of register state. After all there are other MSRs which
> >> we need to put into the right state after a core comes online.
> >> 
> > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> > is offline. The values in it are all 0 when socket is online again. This causes
> > error when user shows the CBMs. So, we have two options when socket is online:
> > 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> > 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.
> > 
> > Per our discussion on v9 patch 05, we decided to adopt option 2.
> 
> Btw., having thought some more about this, putting code into the
> context switch path when there is an alternative is probably the
> wrong thing after all, i.e. if special treatment _is_ really needed,
> doing it in less frequently executed code would likely be better.
> But as before - much depends on clarifying why special treatment
> is needed here, but not elsewhere (and to avoid questions, with
> "elsewhere" I mean outside of PSR/CAT code - there's plenty of
> other CPU register state to take as reference).
> 
Hi, Jan,

As what we talked on IRC last Friday, I have got answers for your
two final questions below:
1. Why domain setting is designed to per-socket, any reason? 
Answer: There is a real case from Intel's customer. HSX (Haswell server)
and BDX (Broadwell server) processors are plugged into each socket of a
Grantley platform. HSX does not support CAT but BDX does.

You and Chao Peng discussed this before for CAT feature enabling patches.
The asymmetry supporting is agreed.
http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs3ti+state:results
http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63ejte+state:results

2. Why cannot the previous setting to the domains be kept when socket is online?
Answer: If the asymmetry system is supported, we cannot assume the configuration
can be applied to new socket.

BRs,
Sun Yi
Jan Beulich April 24, 2017, 6:55 a.m. UTC | #13
>>> On 24.04.17 at 08:40, <yi.y.sun@linux.intel.com> wrote:
> As what we talked on IRC last Friday, I have got answers for your
> two final questions below:
> 1. Why domain setting is designed to per-socket, any reason? 
> Answer: There is a real case from Intel's customer. HSX (Haswell server)
> and BDX (Broadwell server) processors are plugged into each socket of a
> Grantley platform. HSX does not support CAT but BDX does.
> 
> You and Chao Peng discussed this before for CAT feature enabling patches.
> The asymmetry supporting is agreed.

I don't see any agreement in those threads. The first sub-thread is
merely mentioning this configuration, while the second is only about
nr_sockets calculation.

> http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs 
> 3ti+state:results
> http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e 
> jte+state:results
> 
> 2. Why cannot the previous setting to the domains be kept when socket is online?
> Answer: If the asymmetry system is supported, we cannot assume the configuration
> can be applied to new socket.

I'm afraid we'd have problems elsewhere if we tried to run Xen on
a mixed-model system. Unless you can prove PSR/CAT is the only
relevant (read: affecting Xen's behavior) hardware difference
between Haswell and Broadwell (for example, isn't the former
SMEP only, but the latter SMEP+SMAP?), I don't buy this as a
reason to have more complicated than necessary code.

Jan
Yi Sun April 25, 2017, 7:15 a.m. UTC | #14
On 17-04-24 00:55:38, Jan Beulich wrote:
> >>> On 24.04.17 at 08:40, <yi.y.sun@linux.intel.com> wrote:
> > As what we talked on IRC last Friday, I have got answers for your
> > two final questions below:
> > 1. Why domain setting is designed to per-socket, any reason? 
> > Answer: There is a real case from Intel's customer. HSX (Haswell server)
> > and BDX (Broadwell server) processors are plugged into each socket of a
> > Grantley platform. HSX does not support CAT but BDX does.
> > 
> > You and Chao Peng discussed this before for CAT feature enabling patches.
> > The asymmetry supporting is agreed.
> 
> I don't see any agreement in those threads. The first sub-thread is
> merely mentioning this configuration, while the second is only about
> nr_sockets calculation.
> 
> > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs 
> > 3ti+state:results
> > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e 
> > jte+state:results
> > 
> > 2. Why cannot the previous setting to the domains be kept when socket is online?
> > Answer: If the asymmetry system is supported, we cannot assume the configuration
> > can be applied to new socket.
> 
> I'm afraid we'd have problems elsewhere if we tried to run Xen on
> a mixed-model system. Unless you can prove PSR/CAT is the only
> relevant (read: affecting Xen's behavior) hardware difference
> between Haswell and Broadwell (for example, isn't the former
> SMEP only, but the latter SMEP+SMAP?), I don't buy this as a
> reason to have more complicated than necessary code.
> 
Sorry, this may cause potential issue and is not a good example. But from SW
view, there is another case where the per-socket supporting is important in
real-time scenarios. You may have a real-time domain on one socket that requires
different masks (especially for code/data) to guarantee its performance vs.
other general-purpose domains run on a different socket. In that case it’s a
heterogeneous software usage model (rather than heterogeneous hardware). And,
we should not force same masks setting on different sockets in such case.
Because CLOS are a scarce software resource which should be wasted. One of
the most important reasons for managing CLOS independently across sockets is
to preserve the flexibility in using CLOS which is key as they are a scarce
resource. 

BRs,
Sun Yi
Jan Beulich April 25, 2017, 8:24 a.m. UTC | #15
>>> On 25.04.17 at 09:15, <yi.y.sun@linux.intel.com> wrote:
> Sorry, this may cause potential issue and is not a good example. But from SW
> view, there is another case where the per-socket supporting is important in
> real-time scenarios. You may have a real-time domain on one socket that requires
> different masks (especially for code/data) to guarantee its performance vs.
> other general-purpose domains run on a different socket. In that case it’s a
> heterogeneous software usage model (rather than heterogeneous hardware). And,
> we should not force same masks setting on different sockets in such case.

I don't follow: The COS IDs for the real-time and general purpose
domains would be different, wouldn't they? Thus there would be
different masks in use, as intended.

Jan
Yi Sun April 25, 2017, 8:40 a.m. UTC | #16
On 17-04-25 02:24:40, Jan Beulich wrote:
> >>> On 25.04.17 at 09:15, <yi.y.sun@linux.intel.com> wrote:
> > Sorry, this may cause potential issue and is not a good example. But from SW
> > view, there is another case where the per-socket supporting is important in
> > real-time scenarios. You may have a real-time domain on one socket that requires
> > different masks (especially for code/data) to guarantee its performance vs.
> > other general-purpose domains run on a different socket. In that case it’s a
> > heterogeneous software usage model (rather than heterogeneous hardware). And,
> > we should not force same masks setting on different sockets in such case.
> 
> I don't follow: The COS IDs for the real-time and general purpose
> domains would be different, wouldn't they? Thus there would be
> different masks in use, as intended.
> 
Yes, you are right. But as above case, the real-time domain only runs on one
socket. If per-socket supporting is enabled, we can allocate this COS ID to
other domains on other sockets. This can help to improve the scarce cache
utilization.

> Jan
diff mbox

Patch

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index a85ea99..ef8d3e9 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -125,6 +125,8 @@  struct feat_node {
     uint32_t cos_reg_val[MAX_COS_REG_CNT];
 };
 
+#define PSR_DOM_IDS_NUM ((DOMID_IDLE + 1) / sizeof(uint32_t))
+
 /*
  * PSR features are managed per socket. Below structure defines the members
  * used to manage these features.
@@ -134,9 +136,11 @@  struct feat_node {
  *             COS ID. Every entry of cos_ref corresponds to one COS ID.
  */
 struct psr_socket_info {
-    struct feat_node *features[PSR_SOCKET_MAX_FEAT];
     spinlock_t ref_lock;
+    spinlock_t dom_ids_lock;
+    struct feat_node *features[PSR_SOCKET_MAX_FEAT];
     unsigned int cos_ref[MAX_COS_REG_CNT];
+    uint32_t dom_ids[PSR_DOM_IDS_NUM];
 };
 
 struct psr_assoc {
@@ -194,26 +198,11 @@  static void free_socket_resources(unsigned int socket)
 {
     unsigned int i;
     struct psr_socket_info *info = socket_info + socket;
-    struct domain *d;
+    unsigned long flag;
 
     if ( !info )
         return;
 
-    /* Restore domain cos id to 0 when socket is offline. */
-    for_each_domain ( d )
-    {
-        unsigned int cos = d->arch.psr_cos_ids[socket];
-        if ( cos == 0 )
-            continue;
-
-        spin_lock(&info->ref_lock);
-        ASSERT(!cos || info->cos_ref[cos]);
-        info->cos_ref[cos]--;
-        spin_unlock(&info->ref_lock);
-
-        d->arch.psr_cos_ids[socket] = 0;
-    }
-
     /*
      * Free resources of features. The global feature object, e.g. feat_l3_cat,
      * may not be freed here if it is not added into array. It is simply being
@@ -221,12 +210,17 @@  static void free_socket_resources(unsigned int socket)
      */
     for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ )
     {
-        if ( !info->features[i] )
-            continue;
-
         xfree(info->features[i]);
         info->features[i] = NULL;
     }
+
+    spin_lock(&info->ref_lock);
+    memset(info->cos_ref, 0, MAX_COS_REG_CNT * sizeof(unsigned int));
+    spin_unlock(&info->ref_lock);
+
+    spin_lock_irqsave(&info->dom_ids_lock, flag);
+    memset(info->dom_ids, 0, PSR_DOM_IDS_NUM * sizeof(uint32_t));
+    spin_unlock_irqrestore(&info->dom_ids_lock, flag);
 }
 
 static bool feat_init_done(const struct psr_socket_info *info)
@@ -682,9 +676,34 @@  void psr_ctxt_switch_to(struct domain *d)
         psr_assoc_rmid(&reg, d->arch.psr_rmid);
 
     if ( psra->cos_mask )
-        psr_assoc_cos(&reg, d->arch.psr_cos_ids ?
-                      d->arch.psr_cos_ids[cpu_to_socket(smp_processor_id())] :
-                      0, psra->cos_mask);
+    {
+        unsigned int socket = cpu_to_socket(smp_processor_id());
+        struct psr_socket_info *info = socket_info + socket;
+        int old_bit;
+
+        spin_lock(&info->dom_ids_lock);
+
+        old_bit = test_and_set_bit(d->domain_id, info->dom_ids);
+
+        /*
+         * If old_bit is 0, that means this is the first time the domain is
+         * switched to this socket or domain's COS ID has not been set since
+         * the socket is online. So, the domain's COS ID on this socket should
+         * be default value, 0. If not, that means this socket has been offline
+         * and the domain's COS ID has been set when the socket was online. So,
+         * this COS ID is invalid and we have to restore it to 0.
+         */
+        if ( d->arch.psr_cos_ids &&
+             old_bit == 0 &&
+             d->arch.psr_cos_ids[socket] != 0 )
+            d->arch.psr_cos_ids[socket] = 0;
+
+        spin_unlock(&info->dom_ids_lock);
+
+        psr_assoc_cos(&reg,
+                      d->arch.psr_cos_ids ? d->arch.psr_cos_ids[socket] : 0,
+                      psra->cos_mask);
+    }
 
     if ( reg != psra->val )
     {
@@ -1146,40 +1165,6 @@  static int write_psr_msr(unsigned int socket, unsigned int cos,
     return 0;
 }
 
-static void restore_default_val(unsigned int socket, unsigned int cos,
-                                enum psr_feat_type feat_type)
-{
-    unsigned int i, j;
-    uint32_t default_val;
-    const struct psr_socket_info *info = get_socket_info(socket);
-
-    for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ )
-    {
-        const struct feat_node *feat = info->features[i];
-        /*
-         * There are four judgements:
-         * 1. Input 'feat_type' is valid so we have to get feature according to
-         *    it. If current feature type (i) does not match 'feat_type', we
-         *    need skip it, so continue to check next feature.
-         * 2. Input 'feat_type' is 'PSR_SOCKET_MAX_FEAT' which means we should
-         *    handle all features in this case. So, go to next loop.
-         * 3. Do not need restore the COS value back to default if cos_num is 1,
-         *    e.g. L3 CAT. Because next value setting will overwrite it.
-         * 4. 'feat' we got is NULL, continue.
-         */
-        if ( ( feat_type != PSR_SOCKET_MAX_FEAT && feat_type != i ) ||
-             !feat || feat->props->cos_num == 1 )
-            continue;
-
-        for ( j = 0; j < feat->props->cos_num; j++ )
-        {
-            feat->props->get_val(feat, 0, feat->props->type[j], &default_val);
-            write_psr_msr(socket, cos, default_val,
-                          feat->props->type[j], i);
-        }
-    }
-}
-
 /* The whole set process is protected by domctl_lock. */
 int psr_set_val(struct domain *d, unsigned int socket,
                 uint32_t val, enum cbm_type type)
@@ -1191,6 +1176,7 @@  int psr_set_val(struct domain *d, unsigned int socket,
     struct psr_socket_info *info = get_socket_info(socket);
     unsigned int array_len;
     enum psr_feat_type feat_type;
+    unsigned long flag;
 
     if ( IS_ERR(info) )
         return PTR_ERR(info);
@@ -1286,22 +1272,6 @@  int psr_set_val(struct domain *d, unsigned int socket,
     ASSERT(!cos || ref[cos]);
     ASSERT(!old_cos || ref[old_cos]);
     ref[old_cos]--;
-
-    /*
-     * Step 6:
-     * For features,  e.g. CDP, which cos_num is more than 1, we have to
-     * restore the old_cos value back to default when ref[old_cos] is 0.
-     * Otherwise, user will see wrong values when this COS ID is reused. E.g.
-     * user wants to set DATA to 0x3ff for a new domain. He hopes to see the
-     * DATA is set to 0x3ff and CODE should be the default value, 0x7ff. But
-     * if the COS ID picked for this action is the one that has been used by
-     * other domain and the CODE has been set to 0x1ff. Then, user will see
-     * DATA: 0x3ff, CODE: 0x1ff. So, we have to restore COS values for features
-     * using multiple COSs.
-     */
-    if ( old_cos && !ref[old_cos] )
-        restore_default_val(socket, old_cos, feat_type);
-
     spin_unlock(&info->ref_lock);
 
     /*
@@ -1310,7 +1280,10 @@  int psr_set_val(struct domain *d, unsigned int socket,
      * which COS the domain is using on the socket. One domain can only use
      * one COS ID at same time on each socket.
      */
+    spin_lock_irqsave(&info->dom_ids_lock, flag);
     d->arch.psr_cos_ids[socket] = cos;
+    test_and_set_bit(d->domain_id, info->dom_ids);
+    spin_unlock_irqrestore(&info->dom_ids_lock, flag);
 
     xfree(val_array);
     return ret;
@@ -1336,6 +1309,7 @@  static void psr_free_cos(struct domain *d)
     for ( socket = 0; socket < nr_sockets; socket++ )
     {
         struct psr_socket_info *info;
+        unsigned long flag;
 
         /* cos 0 is default one which does not need be handled. */
         cos = d->arch.psr_cos_ids[socket];
@@ -1346,14 +1320,11 @@  static void psr_free_cos(struct domain *d)
         spin_lock(&info->ref_lock);
         ASSERT(!cos || info->cos_ref[cos]);
         info->cos_ref[cos]--;
-        /*
-         * The 'cos_ref[cos]' of 'd' is 0 now so we need restore corresponding
-         * COS registers to default value. Because this case happens when a
-         * domain is destroied, we need restore all features.
-         */
-        if ( !info->cos_ref[cos] )
-            restore_default_val(socket, cos, PSR_SOCKET_MAX_FEAT);
         spin_unlock(&info->ref_lock);
+
+        spin_lock_irqsave(&info->dom_ids_lock, flag);
+        clear_bit(d->domain_id, info->dom_ids);
+        spin_unlock_irqrestore(&info->dom_ids_lock, flag);
     }
 
     xfree(d->arch.psr_cos_ids);
@@ -1453,6 +1424,7 @@  static void psr_cpu_init(void)
         goto assoc_init;
 
     spin_lock_init(&info->ref_lock);
+    spin_lock_init(&info->dom_ids_lock);
 
     cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, &regs);
     if ( regs.b & PSR_RESOURCE_TYPE_L3 )