Message ID | 20170418105533.GK17458@yi.y.sun (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: > I made a test patch based on v10 and attached it in mail. Could you please > help to check it? Thanks! This looks reasonable at the first glance, albeit I continue to be unconvinced that this is the only (reasonable) way of solving the problem. After all we don't have to go through similar hoops for any other of the register state associated with a vCPU. There are a number of cosmetic issues, but commenting on an attached (rather than inlined) patch is a little difficult. One remark regarding the locking though: Acquiring a lock in the context switch path should be made as low risk of long stalls as possible. Therefore you will want to consider using r/w locks instead of spin locks here, which would allow parallelism on all cores of a socket as long as COS IDs aren't being updated. Jan
On 17-04-18 05:46:43, Jan Beulich wrote: > >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: > > I made a test patch based on v10 and attached it in mail. Could you please > > help to check it? Thanks! > > This looks reasonable at the first glance, albeit I continue to be > unconvinced that this is the only (reasonable) way of solving the > problem. After all we don't have to go through similar hoops for > any other of the register state associated with a vCPU. There Sorry, I do not understand your meaning clearly. 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, we check 'dom_ids' array to know if the domain's cos id has not been set but its 'psr_cos_ids[socket]' already has a non zero value. This case means the socket offline has happened so that we have to restore the domain's 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. I think we have discussed this in previous mails and achieved agreement on such logic. 2. DYM the COS MSRs? We have discussed this before. Per your comments, when socket is online, the registers values may be modified by FW and others. So, we have to restore them to default value which is being done in 'cat_init_feature'. So, what is your exactly meaning here? Thanks! > are a number of cosmetic issues, but commenting on an attached > (rather than inlined) patch is a little difficult. > Sorry for that, I will directly send patch out next time. > One remark regarding the locking though: Acquiring a lock in the > context switch path should be made as low risk of long stalls as > possible. Therefore you will want to consider using r/w locks > instead of spin locks here, which would allow parallelism on all > cores of a socket as long as COS IDs aren't being updated. > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions. So, I do not understand why read-write lock is better? Anything I don't consider? Please help to point out. Thanks! > Jan
>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: > On 17-04-18 05:46:43, Jan Beulich wrote: >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >> > I made a test patch based on v10 and attached it in mail. Could you please >> > help to check it? Thanks! >> >> This looks reasonable at the first glance, albeit I continue to be >> unconvinced that this is the only (reasonable) way of solving the >> problem. After all we don't have to go through similar hoops for >> any other of the register state associated with a vCPU. There > > Sorry, I do not understand your meaning clearly. > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, > we check 'dom_ids' array to know if the domain's cos id has not been set but > its 'psr_cos_ids[socket]' already has a non zero value. This case means the > socket offline has happened so that we have to restore the domain's > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. > I think we have discussed this in previous mails and achieved agreement on > such logic. > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when > socket is online, the registers values may be modified by FW and others. > So, we have to restore them to default value which is being done in > 'cat_init_feature'. > > So, what is your exactly meaning here? Thanks! Neither of the two. I'm comparing with COS/PSR-_unrelated_ handling of register state. After all there are other MSRs which we need to put into the right state after a core comes online. >> are a number of cosmetic issues, but commenting on an attached >> (rather than inlined) patch is a little difficult. >> > Sorry for that, I will directly send patch out next time. > >> One remark regarding the locking though: Acquiring a lock in the >> context switch path should be made as low risk of long stalls as >> possible. Therefore you will want to consider using r/w locks >> instead of spin locks here, which would allow parallelism on all >> cores of a socket as long as COS IDs aren't being updated. >> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions. > So, I do not understand why read-write lock is better? Anything I don't > consider? Please help to point out. Thanks! Hmm, looking again I can see that r/w locks indeed may not help here. However, what you say still doesn't look correct to me: You acquire the lock depending on _only_ psra->cos_mask being non- zero. This means that all cores on one socket are being serialized here. Quite possibly all you need is for some of the checks done inside the locked region to be replicated (but _not_ moved) to the outer if(), to limit the number of times where the lock is to be acquired. Jan
On 17-04-19 03:00:29, Jan Beulich wrote: > >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-18 05:46:43, Jan Beulich wrote: > >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: > >> > I made a test patch based on v10 and attached it in mail. Could you please > >> > help to check it? Thanks! > >> > >> This looks reasonable at the first glance, albeit I continue to be > >> unconvinced that this is the only (reasonable) way of solving the Do you have any other suggestion on this? Thanks! > >> problem. After all we don't have to go through similar hoops for > >> any other of the register state associated with a vCPU. There > > > > Sorry, I do not understand your meaning clearly. > > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, > > we check 'dom_ids' array to know if the domain's cos id has not been set but > > its 'psr_cos_ids[socket]' already has a non zero value. This case means the > > socket offline has happened so that we have to restore the domain's > > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. > > I think we have discussed this in previous mails and achieved agreement on > > such logic. > > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when > > socket is online, the registers values may be modified by FW and others. > > So, we have to restore them to default value which is being done in > > 'cat_init_feature'. > > > > So, what is your exactly meaning here? Thanks! > > Neither of the two. I'm comparing with COS/PSR-_unrelated_ > handling of register state. After all there are other MSRs which > we need to put into the right state after a core comes online. > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket is offline. The values in it are all 0 when socket is online again. This causes error when user shows the CBMs. So, we have two options when socket is online: 1. Read COS MSRs values and save them into 'cos_reg_val[]'. 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM. Per our discussion on v9 patch 05, we decided to adopt option 2. Below are what we discussed. FYR. [v9 patch 05] >> >> > + /* cos=0 is reserved as default cbm(all bits within cbm_len are 1). */ >> >> > + feat->cos_reg_val[0] = cat_default_val(cat.cbm_len); >> >> > + /* >> >> > + * To handle cpu offline and then online case, we need read MSRs back to >> >> > + * save values into cos_reg_val array. >> >> > + */ >> >> > + for ( i = 1; i <= cat.cos_max; i++ ) >> >> > + { >> >> > + rdmsrl(MSR_IA32_PSR_L3_MASK(i), val); >> >> > + feat->cos_reg_val[i] = (uint32_t)val; >> >> > + } >> >> [Jan] >> >> You mention this in the changes done, but I don't understand why >> >> you do this. What meaning to these values have to you? If you want >> >> hardware and cached values to match up, the much more conventional >> >> way of enforcing this would be to write the values you actually >> >> want (normally all zero). >> >> [Sun Yi] >> > When all cpus on a socket are offline, the free_feature() is called >> > to free features resources so that the values saved in >> > cos_reg_val[] are lost. When the socket is online again, features >> > are allocated again so that cos_reg_val[] members are all >> > initialized to 0. Only is cos_reg_val[0] initialized to default value in this function in old codes. >> > >> > But domain is still alive so that its cos id on the socket is kept. >> > The corresponding MSR value is kept too per test. To make >> > cos_reg_val[] values be same as HW to not to mislead user, we >> > should read back the valid values on HW into cos_reg_val[]. >> [Jan] >> Okay, I understand the background, but I don't view this solution as >> viable: Once the last core on a socket goes offline, all references >> to it should be cleaned up. After all what will be brought back >> online may be a different physical CPU altogether; you can't assume >> MSR values to have survived even if it is the same CPU which comes >> back online, as it may have undergone a reset cycle, or BIOS/SMM may >> have played with the MSRs. >> That's even a possibility for a single core coming back online, so >> you have to reload MSRs explicitly anyway if implicit reloading (i.e. >> once vCPU-s get scheduled onto it) doesn't suffice. >> [Sun Yi] > So, you think the MSRs values may not be valid after such process and > reloading (write MSRs to default value) is needed. If so, I would like > to do more operations in 'free_feature()': > 1. Iterate all domains working on the offline socket to change > 'd->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init > status. > 2. Restore 'socket_info[socket].cos_ref[]' to all 0. > > These can make the socket's info be totally restored back to init status. [Jan] Yes, that's what I think is needed. > >> are a number of cosmetic issues, but commenting on an attached > >> (rather than inlined) patch is a little difficult. > >> > > Sorry for that, I will directly send patch out next time. > > > >> One remark regarding the locking though: Acquiring a lock in the > >> context switch path should be made as low risk of long stalls as > >> possible. Therefore you will want to consider using r/w locks > >> instead of spin locks here, which would allow parallelism on all > >> cores of a socket as long as COS IDs aren't being updated. > >> > > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions. > > So, I do not understand why read-write lock is better? Anything I don't > > consider? Please help to point out. Thanks! > > Hmm, looking again I can see that r/w locks indeed may not help > here. However, what you say still doesn't look correct to me: You > acquire the lock depending on _only_ psra->cos_mask being non- > zero. This means that all cores on one socket are being serialized > here. Quite possibly all you need is for some of the checks done > inside the locked region to be replicated (but _not_ moved) to the > outer if(), to limit the number of times where the lock is to be > acquired. > I think your suggestions is to check old_cos outer the lock region. If it is not 0 which means the domain's cos id does not need restore to 0, we can directly set it into ASSOC register. That can avoid unnecessary lock. I will send out the test patch again to ask your help to provide review comments (you said there are also 'a number of cosmetics issues'). Thanks! > Jan
>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: > On 17-04-19 03:00:29, Jan Beulich wrote: >> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-18 05:46:43, Jan Beulich wrote: >> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >> >> > I made a test patch based on v10 and attached it in mail. Could you please >> >> > help to check it? Thanks! >> >> >> >> This looks reasonable at the first glance, albeit I continue to be >> >> unconvinced that this is the only (reasonable) way of solving the > > Do you have any other suggestion on this? Thanks! I'm sorry, but no. I'm already spending _much_ more time on this series than I should be, considering its (afaic) relatively low priority. I really have to ask you to properly think through both the data layout and code structure, including considering alternatives especially in places where you can _anticipate_ controversy. >> >> problem. After all we don't have to go through similar hoops for >> >> any other of the register state associated with a vCPU. There >> > >> > Sorry, I do not understand your meaning clearly. >> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, >> > we check 'dom_ids' array to know if the domain's cos id has not been set but >> > its 'psr_cos_ids[socket]' already has a non zero value. This case means the >> > socket offline has happened so that we have to restore the domain's >> > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. >> > I think we have discussed this in previous mails and achieved agreement on >> > such logic. >> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when >> > socket is online, the registers values may be modified by FW and others. >> > So, we have to restore them to default value which is being done in >> > 'cat_init_feature'. >> > >> > So, what is your exactly meaning here? Thanks! >> >> Neither of the two. I'm comparing with COS/PSR-_unrelated_ >> handling of register state. After all there are other MSRs which >> we need to put into the right state after a core comes online. >> > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket > is offline. The values in it are all 0 when socket is online again. This causes > error when user shows the CBMs. So, we have two options when socket is > online: > 1. Read COS MSRs values and save them into 'cos_reg_val[]'. > 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM. This re-states what you want to do; it does not answer my question. Along the lines of what you say, for example FS and GS base MSRs come back as zero too after a socket has been (re-)onlined. We don't need to go through any hoops there, nevertheless. > Per our discussion on v9 patch 05, we decided to adopt option 2. Below are > what we discussed. FYR. Well, we decided option 2 is better than option 1. I'm still unconvinced there's not a yet better alternative. >> >> are a number of cosmetic issues, but commenting on an attached >> >> (rather than inlined) patch is a little difficult. >> >> >> > Sorry for that, I will directly send patch out next time. >> > >> >> One remark regarding the locking though: Acquiring a lock in the >> >> context switch path should be made as low risk of long stalls as >> >> possible. Therefore you will want to consider using r/w locks >> >> instead of spin locks here, which would allow parallelism on all >> >> cores of a socket as long as COS IDs aren't being updated. >> >> >> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions. >> > So, I do not understand why read-write lock is better? Anything I don't >> > consider? Please help to point out. Thanks! >> >> Hmm, looking again I can see that r/w locks indeed may not help >> here. However, what you say still doesn't look correct to me: You >> acquire the lock depending on _only_ psra->cos_mask being non- >> zero. This means that all cores on one socket are being serialized >> here. Quite possibly all you need is for some of the checks done >> inside the locked region to be replicated (but _not_ moved) to the >> outer if(), to limit the number of times where the lock is to be >> acquired. >> > I think your suggestions is to check old_cos outer the lock region. My suggestion is to check as much state as possible, to prevent having to acquire the lock whenever possible. > If it is not 0 which means the domain's cos id does not need restore > to 0, we can directly set it into ASSOC register. That can avoid > unnecessary lock. I will send out the test patch again to ask your > help to provide review comments (you said there are also 'a number > of cosmetics issues'). And I would hope you would try to eliminate some (if not all) yourself first. After all you can easily go over your patch yourself, checking for e.g. style violations. Jan
Apologies for stepping in here. But it feels to me that this thread is at risk of becoming unproductive. > On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote: > >>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: >> On 17-04-19 03:00:29, Jan Beulich wrote: >>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: >>>> On 17-04-18 05:46:43, Jan Beulich wrote: >>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >>>>>> I made a test patch based on v10 and attached it in mail. Could you please >>>>>> help to check it? Thanks! >>>>> >>>>> This looks reasonable at the first glance, albeit I continue to be >>>>> unconvinced that this is the only (reasonable) way of solving the >> >> Do you have any other suggestion on this? Thanks! > > I'm sorry, but no. I'm already spending _much_ more time on this > series than I should be, considering its (afaic) relatively low > priority. I really have to ask you to properly think through both > the data layout and code structure, including considering > alternatives especially in places where you can _anticipate_ > controversy. Jan, can you confirm whether you are happy with the current proposal. You say "this looks reasonable at the first glance", but then go on to say that there may be other "reasonable ways" of solving the problem at hand. This is not very concrete and hard to respond to from a contributors point of view: it would be good to establish whether a) the v10 proposal in this area is good enough, b) whether there are any concrete improvements to be made. You say please think through whether "you can _anticipate_ controversy", but at the same time you can't currently identify/think of any issues. That seems to suggest to me that maybe the proposal is good enough. Or that something is missing, which has not been articulated. Taking into account language barriers, more clarity would probably be helpful. >> Per our discussion on v9 patch 05, we decided to adopt option 2. Below are >> what we discussed. FYR. > > Well, we decided option 2 is better than option 1. I'm still > unconvinced there's not a yet better alternative. I suppose that is the same type of argument. Aka we looked at a number of options, seem to have agreed one is better than the other. But there is no clarity as to whether in this case option 2 is good enough and what concrete steps would be needed to get to a better alternative. Of course I may have missed some of the context here in some of the older threads and thus I may have missed some of the context. >> If it is not 0 which means the domain's cos id does not need restore >> to 0, we can directly set it into ASSOC register. That can avoid >> unnecessary lock. I will send out the test patch again to ask your >> help to provide review comments (you said there are also 'a number >> of cosmetics issues'). > > And I would hope you would try to eliminate some (if not all) yourself > first. After all you can easily go over your patch yourself, checking > for e.g. style violations. I think this is fair enough. Best Regards Lars
>>> On 20.04.17 at 15:02, <lars.kurth.xen@gmail.com> wrote: >> On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote: >> >>>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: >>> On 17-04-19 03:00:29, Jan Beulich wrote: >>>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: >>>>> On 17-04-18 05:46:43, Jan Beulich wrote: >>>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >>>>>>> I made a test patch based on v10 and attached it in mail. Could you please >>>>>>> help to check it? Thanks! >>>>>> >>>>>> This looks reasonable at the first glance, albeit I continue to be >>>>>> unconvinced that this is the only (reasonable) way of solving the >>> >>> Do you have any other suggestion on this? Thanks! >> >> I'm sorry, but no. I'm already spending _much_ more time on this >> series than I should be, considering its (afaic) relatively low >> priority. I really have to ask you to properly think through both >> the data layout and code structure, including considering >> alternatives especially in places where you can _anticipate_ >> controversy. > > Jan, can you confirm whether you are happy with the current proposal. You > say "this looks reasonable at the first glance", but then go on to say that > there may be other "reasonable ways" of solving the problem at hand. > > This is not very concrete and hard to respond to from a contributors point > of view: it would be good to establish whether a) the v10 proposal in this > area is good enough, b) whether there are any concrete improvements to be > made. I understand it's not very concrete, but please understand that with the over 100 patches wanting looking at right now it is simply impossible for me to give precise suggestions everywhere. I really have to be allowed to defer to the originator to come up with possible better mechanisms (or defend what there is by addressing questions raised), especially with - as said - the amount of time spent here already having been way higher than is justifiable. Just go and compare v10 with one of the initial versions: Almost all of the data layout and code flow have fundamentally changed, mostly based on feedback I gave. I'm sorry for saying that, but to me this is an indication that things hadn't been thought through well in the design phase, i.e. before even submitting a first RFC. > You say please think through whether "you can _anticipate_ controversy", but > at the same time you can't currently identify/think of any issues. That seems > to suggest to me that maybe the proposal is good enough. Or that something is > missing, which has not been articulated. Taking into account language > barriers, more clarity would probably be helpful. I've given criteria by which I have the gut feeling (but no more) that this isn't the right approach. I'm absolutely fine to be convinced that my gut feeling is wrong. That would require to simply answer the question I raised multiple times, and which was repeatedly "answered" by simply re-stating previously expressed facts. If this reaction of mine is not acceptable, all I can do is refrain from further looking at this series. And Yi, I certainly apologize for perhaps not doing these reviews wholeheartedly, since - as also expressed before - I continue to not really view this as very important functionality. Yet considering for how long some of the versions hadn't been looked at by anyone at all, the alternative would have been to simply let it sit further without looking at it. I actually take this lack of interest by others as an indication that I'm not the only one considering this nice to have, but no more. Jan
> On 20 Apr 2017, at 14:21, Jan Beulich <jbeulich@suse.com> wrote: > >>>> On 20.04.17 at 15:02, <lars.kurth.xen@gmail.com> wrote: >>> On 20 Apr 2017, at 10:43, Jan Beulich <jbeulich@suse.com> wrote: >>> >>>>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: >>>> On 17-04-19 03:00:29, Jan Beulich wrote: >>>>>>>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: >>>>>> On 17-04-18 05:46:43, Jan Beulich wrote: >>>>>>>>>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >>>>>>>> I made a test patch based on v10 and attached it in mail. Could you please >>>>>>>> help to check it? Thanks! >>>>>>> [Item 1] >>>>>>> This looks reasonable at the first glance, albeit I continue to be >>>>>>> unconvinced that this is the only (reasonable) way of solving the >>>> >>>> Do you have any other suggestion on this? Thanks! >>> >>> I'm sorry, but no. I'm already spending _much_ more time on this >>> series than I should be, considering its (afaic) relatively low >>> priority. I really have to ask you to properly think through both >>> the data layout and code structure, including considering >>> alternatives especially in places where you can _anticipate_ >>> controversy. >> >> Jan, can you confirm whether you are happy with the current proposal. You >> say "this looks reasonable at the first glance", but then go on to say that >> there may be other "reasonable ways" of solving the problem at hand. >> >> This is not very concrete and hard to respond to from a contributors point >> of view: it would be good to establish whether a) the v10 proposal in this >> area is good enough, b) whether there are any concrete improvements to be >> made. > I understand it's not very concrete, but please understand that with > the over 100 patches wanting looking at right now it is simply > impossible for me to give precise suggestions everywhere. I really > have to be allowed to defer to the originator to come up with > possible better mechanisms (or defend what there is by addressing > questions raised), Jan, I don't object to the principle of deferring issues to a contributor, for contributor to defend their viewpoint or to come up with a better mechanism. I just observed, that I could not make a lot of sense what you were looking for in this particular review. I am assuming that it would be even harder for Yi. > especially with - as said - the amount of time spent > here already having been way higher than is justifiable. And of course we have passed the 4.9 code freeze, so some of the pressure is off. At the same time I understand that because of the upcoming releases you need to focus on bug fixes, etc. > Just go and > compare v10 with one of the initial versions: Almost all of the data > layout and code flow have fundamentally changed, mostly based on > feedback I gave. I'm sorry for saying that, but to me this is an > indication that things hadn't been thought through well in the design > phase, i.e. before even submitting a first RFC. That is good feedback which may contain some valuable lessons. Once we are through this (or maybe at the summit) it may be worthwhile to look at what has gone wrong and see how we can do better in future. [Item 2] >> You say please think through whether "you can _anticipate_ controversy", but >> at the same time you can't currently identify/think of any issues. That seems >> to suggest to me that maybe the proposal is good enough. Or that something is >> missing, which has not been articulated. Taking into account language >> barriers, more clarity would probably be helpful. > > I've given criteria by which I have the gut feeling (but no more) > that this isn't the right approach. I'm absolutely fine to be > convinced that my gut feeling is wrong. That would require to > simply answer the question I raised multiple times, and which was > repeatedly "answered" by simply re-stating previously expressed > facts. I have not followed the full thread. But it seems that we have communications issue there. Normally this happens when expectations don't quite match and one party (in this case Yi) does not quite get what the other one is looking for. Maybe the best approach would be for Yi to get some of these things resolved during a short IRC conversation with you. I did see him and others resolve some previous issues more effectively on IRC. > If this reaction of mine is not acceptable, all I can do is refrain > from further looking at this series. And Yi, I certainly apologize > for perhaps not doing these reviews wholeheartedly, since - > as also expressed before - I continue to not really view this as > very important functionality. As I said earlier, I stepped in, as I didn't really understand what was going on. I think this is a little clearer now. So to summarise: On item 1: it appears that you are looking for a some more justification why some of the changes were made, maybe with a rationale for some of the choices that were made. Given that this is quite a complex series which has diverged quite a lot from the design, the goal is to make it easier for either you (or someone else) to sanity check the proposal which on the face of things look OK. But you have some doubts and you can't easily check against the design as it is out-of-date. On item 2: you think something may not be quite right, but you can't really decide until a couple of questions (not quite sure which, but I am sure Yi can locate them) are answered. Let me know whether this is actually true. > Yet considering for how long some > of the versions hadn't been looked at by anyone at all, the > alternative would have been to simply let it sit further without > looking at it. I actually take this lack of interest by others as an > indication that I'm not the only one considering this nice to have, > but no more. That's a good point. There have been some comments from others, but it is also true that 66% of comments on this series did come from you (see https://xen.biterg.io:443/goto/72a753919aa8e3205d03da7b430d2696). On the other hand, this is also a series which is hard to hand over to someone else. Regards Lars
> If this reaction of mine is not acceptable, all I can do is refrain > from further looking at this series. And Yi, I certainly apologize > for perhaps not doing these reviews wholeheartedly, since - > as also expressed before - I continue to not really view this as > very important functionality. Yet considering for how long some > of the versions hadn't been looked at by anyone at all, the > alternative would have been to simply let it sit further without > looking at it. I actually take this lack of interest by others as an > indication that I'm not the only one considering this nice to have, > but no more. I do have an interest in this series. And I can certainly give it a review - but once you get your teeth in a patchset I feel it is bit counterintuitive to review it - as you do a much much better job that I could have.
>>> On 20.04.17 at 18:52, <lars.kurth.xen@gmail.com> wrote: > So to summarise: > > On item 1: it appears that you are looking for a some more justification why > some of the changes were made, maybe with a rationale for some of the choices > that were made. Given that this is quite a complex series which has diverged > quite a lot from the design, the goal is to make it easier for either you (or > someone else) to sanity check the proposal which on the face of things look > OK. But you have some doubts and you can't easily check against the design as > it is out-of-date. > > On item 2: you think something may not be quite right, but you can't really > decide until a couple of questions (not quite sure which, but I am sure Yi > can locate them) are answered. > > Let me know whether this is actually true. Well, afaict both actually boil down to the same single question regarding the special handling of CAT MSRs after onlining (at runtime) a core on a socket all of whose cores had been offline, namely considering that other CPU registers don't require any such special treatment (in context switch code or elsewhere). As to the design possibly being out of date - I have to admit I didn't even check whether the accompanying documentation has been kept up to date with the actual code changes. The matter here really isn't with comparing with the design, but rather whether the design choice (written down or not) was an appropriate one. Jan
>>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: > On 17-04-19 03:00:29, Jan Beulich wrote: >> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: >> > On 17-04-18 05:46:43, Jan Beulich wrote: >> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: >> >> > I made a test patch based on v10 and attached it in mail. Could you please >> >> > help to check it? Thanks! >> >> >> >> This looks reasonable at the first glance, albeit I continue to be >> >> unconvinced that this is the only (reasonable) way of solving the > > Do you have any other suggestion on this? Thanks! > >> >> problem. After all we don't have to go through similar hoops for >> >> any other of the register state associated with a vCPU. There >> > >> > Sorry, I do not understand your meaning clearly. >> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, >> > we check 'dom_ids' array to know if the domain's cos id has not been set but >> > its 'psr_cos_ids[socket]' already has a non zero value. This case means the >> > socket offline has happened so that we have to restore the domain's >> > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. >> > I think we have discussed this in previous mails and achieved agreement on >> > such logic. >> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when >> > socket is online, the registers values may be modified by FW and others. >> > So, we have to restore them to default value which is being done in >> > 'cat_init_feature'. >> > >> > So, what is your exactly meaning here? Thanks! >> >> Neither of the two. I'm comparing with COS/PSR-_unrelated_ >> handling of register state. After all there are other MSRs which >> we need to put into the right state after a core comes online. >> > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket > is offline. The values in it are all 0 when socket is online again. This causes > error when user shows the CBMs. So, we have two options when socket is online: > 1. Read COS MSRs values and save them into 'cos_reg_val[]'. > 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM. > > Per our discussion on v9 patch 05, we decided to adopt option 2. Btw., having thought some more about this, putting code into the context switch path when there is an alternative is probably the wrong thing after all, i.e. if special treatment _is_ really needed, doing it in less frequently executed code would likely be better. But as before - much depends on clarifying why special treatment is needed here, but not elsewhere (and to avoid questions, with "elsewhere" I mean outside of PSR/CAT code - there's plenty of other CPU register state to take as reference). Jan
On 17-04-21 00:18:27, Jan Beulich wrote: > >>> On 20.04.17 at 04:14, <yi.y.sun@linux.intel.com> wrote: > > On 17-04-19 03:00:29, Jan Beulich wrote: > >> >>> On 19.04.17 at 10:22, <yi.y.sun@linux.intel.com> wrote: > >> > On 17-04-18 05:46:43, Jan Beulich wrote: > >> >> >>> On 18.04.17 at 12:55, <yi.y.sun@linux.intel.com> wrote: > >> >> > I made a test patch based on v10 and attached it in mail. Could you please > >> >> > help to check it? Thanks! > >> >> > >> >> This looks reasonable at the first glance, albeit I continue to be > >> >> unconvinced that this is the only (reasonable) way of solving the > > > > Do you have any other suggestion on this? Thanks! > > > >> >> problem. After all we don't have to go through similar hoops for > >> >> any other of the register state associated with a vCPU. There > >> > > >> > Sorry, I do not understand your meaning clearly. > >> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch, > >> > we check 'dom_ids' array to know if the domain's cos id has not been set but > >> > its 'psr_cos_ids[socket]' already has a non zero value. This case means the > >> > socket offline has happened so that we have to restore the domain's > >> > 'psr_cos_ids[socket]' to default value 0 which points to default COS MSR. > >> > I think we have discussed this in previous mails and achieved agreement on > >> > such logic. > >> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when > >> > socket is online, the registers values may be modified by FW and others. > >> > So, we have to restore them to default value which is being done in > >> > 'cat_init_feature'. > >> > > >> > So, what is your exactly meaning here? Thanks! > >> > >> Neither of the two. I'm comparing with COS/PSR-_unrelated_ > >> handling of register state. After all there are other MSRs which > >> we need to put into the right state after a core comes online. > >> > > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket > > is offline. The values in it are all 0 when socket is online again. This causes > > error when user shows the CBMs. So, we have two options when socket is online: > > 1. Read COS MSRs values and save them into 'cos_reg_val[]'. > > 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM. > > > > Per our discussion on v9 patch 05, we decided to adopt option 2. > > Btw., having thought some more about this, putting code into the > context switch path when there is an alternative is probably the > wrong thing after all, i.e. if special treatment _is_ really needed, > doing it in less frequently executed code would likely be better. > But as before - much depends on clarifying why special treatment > is needed here, but not elsewhere (and to avoid questions, with > "elsewhere" I mean outside of PSR/CAT code - there's plenty of > other CPU register state to take as reference). > Hi, Jan, As what we talked on IRC last Friday, I have got answers for your two final questions below: 1. Why domain setting is designed to per-socket, any reason? Answer: There is a real case from Intel's customer. HSX (Haswell server) and BDX (Broadwell server) processors are plugged into each socket of a Grantley platform. HSX does not support CAT but BDX does. You and Chao Peng discussed this before for CAT feature enabling patches. The asymmetry supporting is agreed. http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs3ti+state:results http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63ejte+state:results 2. Why cannot the previous setting to the domains be kept when socket is online? Answer: If the asymmetry system is supported, we cannot assume the configuration can be applied to new socket. BRs, Sun Yi
>>> On 24.04.17 at 08:40, <yi.y.sun@linux.intel.com> wrote: > As what we talked on IRC last Friday, I have got answers for your > two final questions below: > 1. Why domain setting is designed to per-socket, any reason? > Answer: There is a real case from Intel's customer. HSX (Haswell server) > and BDX (Broadwell server) processors are plugged into each socket of a > Grantley platform. HSX does not support CAT but BDX does. > > You and Chao Peng discussed this before for CAT feature enabling patches. > The asymmetry supporting is agreed. I don't see any agreement in those threads. The first sub-thread is merely mentioning this configuration, while the second is only about nr_sockets calculation. > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs > 3ti+state:results > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e > jte+state:results > > 2. Why cannot the previous setting to the domains be kept when socket is online? > Answer: If the asymmetry system is supported, we cannot assume the configuration > can be applied to new socket. I'm afraid we'd have problems elsewhere if we tried to run Xen on a mixed-model system. Unless you can prove PSR/CAT is the only relevant (read: affecting Xen's behavior) hardware difference between Haswell and Broadwell (for example, isn't the former SMEP only, but the latter SMEP+SMAP?), I don't buy this as a reason to have more complicated than necessary code. Jan
On 17-04-24 00:55:38, Jan Beulich wrote: > >>> On 24.04.17 at 08:40, <yi.y.sun@linux.intel.com> wrote: > > As what we talked on IRC last Friday, I have got answers for your > > two final questions below: > > 1. Why domain setting is designed to per-socket, any reason? > > Answer: There is a real case from Intel's customer. HSX (Haswell server) > > and BDX (Broadwell server) processors are plugged into each socket of a > > Grantley platform. HSX does not support CAT but BDX does. > > > > You and Chao Peng discussed this before for CAT feature enabling patches. > > The asymmetry supporting is agreed. > > I don't see any agreement in those threads. The first sub-thread is > merely mentioning this configuration, while the second is only about > nr_sockets calculation. > > > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs > > 3ti+state:results > > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e > > jte+state:results > > > > 2. Why cannot the previous setting to the domains be kept when socket is online? > > Answer: If the asymmetry system is supported, we cannot assume the configuration > > can be applied to new socket. > > I'm afraid we'd have problems elsewhere if we tried to run Xen on > a mixed-model system. Unless you can prove PSR/CAT is the only > relevant (read: affecting Xen's behavior) hardware difference > between Haswell and Broadwell (for example, isn't the former > SMEP only, but the latter SMEP+SMAP?), I don't buy this as a > reason to have more complicated than necessary code. > Sorry, this may cause potential issue and is not a good example. But from SW view, there is another case where the per-socket supporting is important in real-time scenarios. You may have a real-time domain on one socket that requires different masks (especially for code/data) to guarantee its performance vs. other general-purpose domains run on a different socket. In that case it’s a heterogeneous software usage model (rather than heterogeneous hardware). And, we should not force same masks setting on different sockets in such case. Because CLOS are a scarce software resource which should be wasted. One of the most important reasons for managing CLOS independently across sockets is to preserve the flexibility in using CLOS which is key as they are a scarce resource. BRs, Sun Yi
>>> On 25.04.17 at 09:15, <yi.y.sun@linux.intel.com> wrote: > Sorry, this may cause potential issue and is not a good example. But from SW > view, there is another case where the per-socket supporting is important in > real-time scenarios. You may have a real-time domain on one socket that requires > different masks (especially for code/data) to guarantee its performance vs. > other general-purpose domains run on a different socket. In that case it’s a > heterogeneous software usage model (rather than heterogeneous hardware). And, > we should not force same masks setting on different sockets in such case. I don't follow: The COS IDs for the real-time and general purpose domains would be different, wouldn't they? Thus there would be different masks in use, as intended. Jan
On 17-04-25 02:24:40, Jan Beulich wrote: > >>> On 25.04.17 at 09:15, <yi.y.sun@linux.intel.com> wrote: > > Sorry, this may cause potential issue and is not a good example. But from SW > > view, there is another case where the per-socket supporting is important in > > real-time scenarios. You may have a real-time domain on one socket that requires > > different masks (especially for code/data) to guarantee its performance vs. > > other general-purpose domains run on a different socket. In that case it’s a > > heterogeneous software usage model (rather than heterogeneous hardware). And, > > we should not force same masks setting on different sockets in such case. > > I don't follow: The COS IDs for the real-time and general purpose > domains would be different, wouldn't they? Thus there would be > different masks in use, as intended. > Yes, you are right. But as above case, the real-time domain only runs on one socket. If per-socket supporting is enabled, we can allocate this COS ID to other domains on other sockets. This can help to improve the scarce cache utilization. > Jan
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index a85ea99..ef8d3e9 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -125,6 +125,8 @@ struct feat_node { uint32_t cos_reg_val[MAX_COS_REG_CNT]; }; +#define PSR_DOM_IDS_NUM ((DOMID_IDLE + 1) / sizeof(uint32_t)) + /* * PSR features are managed per socket. Below structure defines the members * used to manage these features. @@ -134,9 +136,11 @@ struct feat_node { * COS ID. Every entry of cos_ref corresponds to one COS ID. */ struct psr_socket_info { - struct feat_node *features[PSR_SOCKET_MAX_FEAT]; spinlock_t ref_lock; + spinlock_t dom_ids_lock; + struct feat_node *features[PSR_SOCKET_MAX_FEAT]; unsigned int cos_ref[MAX_COS_REG_CNT]; + uint32_t dom_ids[PSR_DOM_IDS_NUM]; }; struct psr_assoc { @@ -194,26 +198,11 @@ static void free_socket_resources(unsigned int socket) { unsigned int i; struct psr_socket_info *info = socket_info + socket; - struct domain *d; + unsigned long flag; if ( !info ) return; - /* Restore domain cos id to 0 when socket is offline. */ - for_each_domain ( d ) - { - unsigned int cos = d->arch.psr_cos_ids[socket]; - if ( cos == 0 ) - continue; - - spin_lock(&info->ref_lock); - ASSERT(!cos || info->cos_ref[cos]); - info->cos_ref[cos]--; - spin_unlock(&info->ref_lock); - - d->arch.psr_cos_ids[socket] = 0; - } - /* * Free resources of features. The global feature object, e.g. feat_l3_cat, * may not be freed here if it is not added into array. It is simply being @@ -221,12 +210,17 @@ static void free_socket_resources(unsigned int socket) */ for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ ) { - if ( !info->features[i] ) - continue; - xfree(info->features[i]); info->features[i] = NULL; } + + spin_lock(&info->ref_lock); + memset(info->cos_ref, 0, MAX_COS_REG_CNT * sizeof(unsigned int)); + spin_unlock(&info->ref_lock); + + spin_lock_irqsave(&info->dom_ids_lock, flag); + memset(info->dom_ids, 0, PSR_DOM_IDS_NUM * sizeof(uint32_t)); + spin_unlock_irqrestore(&info->dom_ids_lock, flag); } static bool feat_init_done(const struct psr_socket_info *info) @@ -682,9 +676,34 @@ void psr_ctxt_switch_to(struct domain *d) psr_assoc_rmid(®, d->arch.psr_rmid); if ( psra->cos_mask ) - psr_assoc_cos(®, d->arch.psr_cos_ids ? - d->arch.psr_cos_ids[cpu_to_socket(smp_processor_id())] : - 0, psra->cos_mask); + { + unsigned int socket = cpu_to_socket(smp_processor_id()); + struct psr_socket_info *info = socket_info + socket; + int old_bit; + + spin_lock(&info->dom_ids_lock); + + old_bit = test_and_set_bit(d->domain_id, info->dom_ids); + + /* + * If old_bit is 0, that means this is the first time the domain is + * switched to this socket or domain's COS ID has not been set since + * the socket is online. So, the domain's COS ID on this socket should + * be default value, 0. If not, that means this socket has been offline + * and the domain's COS ID has been set when the socket was online. So, + * this COS ID is invalid and we have to restore it to 0. + */ + if ( d->arch.psr_cos_ids && + old_bit == 0 && + d->arch.psr_cos_ids[socket] != 0 ) + d->arch.psr_cos_ids[socket] = 0; + + spin_unlock(&info->dom_ids_lock); + + psr_assoc_cos(®, + d->arch.psr_cos_ids ? d->arch.psr_cos_ids[socket] : 0, + psra->cos_mask); + } if ( reg != psra->val ) { @@ -1146,40 +1165,6 @@ static int write_psr_msr(unsigned int socket, unsigned int cos, return 0; } -static void restore_default_val(unsigned int socket, unsigned int cos, - enum psr_feat_type feat_type) -{ - unsigned int i, j; - uint32_t default_val; - const struct psr_socket_info *info = get_socket_info(socket); - - for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ ) - { - const struct feat_node *feat = info->features[i]; - /* - * There are four judgements: - * 1. Input 'feat_type' is valid so we have to get feature according to - * it. If current feature type (i) does not match 'feat_type', we - * need skip it, so continue to check next feature. - * 2. Input 'feat_type' is 'PSR_SOCKET_MAX_FEAT' which means we should - * handle all features in this case. So, go to next loop. - * 3. Do not need restore the COS value back to default if cos_num is 1, - * e.g. L3 CAT. Because next value setting will overwrite it. - * 4. 'feat' we got is NULL, continue. - */ - if ( ( feat_type != PSR_SOCKET_MAX_FEAT && feat_type != i ) || - !feat || feat->props->cos_num == 1 ) - continue; - - for ( j = 0; j < feat->props->cos_num; j++ ) - { - feat->props->get_val(feat, 0, feat->props->type[j], &default_val); - write_psr_msr(socket, cos, default_val, - feat->props->type[j], i); - } - } -} - /* The whole set process is protected by domctl_lock. */ int psr_set_val(struct domain *d, unsigned int socket, uint32_t val, enum cbm_type type) @@ -1191,6 +1176,7 @@ int psr_set_val(struct domain *d, unsigned int socket, struct psr_socket_info *info = get_socket_info(socket); unsigned int array_len; enum psr_feat_type feat_type; + unsigned long flag; if ( IS_ERR(info) ) return PTR_ERR(info); @@ -1286,22 +1272,6 @@ int psr_set_val(struct domain *d, unsigned int socket, ASSERT(!cos || ref[cos]); ASSERT(!old_cos || ref[old_cos]); ref[old_cos]--; - - /* - * Step 6: - * For features, e.g. CDP, which cos_num is more than 1, we have to - * restore the old_cos value back to default when ref[old_cos] is 0. - * Otherwise, user will see wrong values when this COS ID is reused. E.g. - * user wants to set DATA to 0x3ff for a new domain. He hopes to see the - * DATA is set to 0x3ff and CODE should be the default value, 0x7ff. But - * if the COS ID picked for this action is the one that has been used by - * other domain and the CODE has been set to 0x1ff. Then, user will see - * DATA: 0x3ff, CODE: 0x1ff. So, we have to restore COS values for features - * using multiple COSs. - */ - if ( old_cos && !ref[old_cos] ) - restore_default_val(socket, old_cos, feat_type); - spin_unlock(&info->ref_lock); /* @@ -1310,7 +1280,10 @@ int psr_set_val(struct domain *d, unsigned int socket, * which COS the domain is using on the socket. One domain can only use * one COS ID at same time on each socket. */ + spin_lock_irqsave(&info->dom_ids_lock, flag); d->arch.psr_cos_ids[socket] = cos; + test_and_set_bit(d->domain_id, info->dom_ids); + spin_unlock_irqrestore(&info->dom_ids_lock, flag); xfree(val_array); return ret; @@ -1336,6 +1309,7 @@ static void psr_free_cos(struct domain *d) for ( socket = 0; socket < nr_sockets; socket++ ) { struct psr_socket_info *info; + unsigned long flag; /* cos 0 is default one which does not need be handled. */ cos = d->arch.psr_cos_ids[socket]; @@ -1346,14 +1320,11 @@ static void psr_free_cos(struct domain *d) spin_lock(&info->ref_lock); ASSERT(!cos || info->cos_ref[cos]); info->cos_ref[cos]--; - /* - * The 'cos_ref[cos]' of 'd' is 0 now so we need restore corresponding - * COS registers to default value. Because this case happens when a - * domain is destroied, we need restore all features. - */ - if ( !info->cos_ref[cos] ) - restore_default_val(socket, cos, PSR_SOCKET_MAX_FEAT); spin_unlock(&info->ref_lock); + + spin_lock_irqsave(&info->dom_ids_lock, flag); + clear_bit(d->domain_id, info->dom_ids); + spin_unlock_irqrestore(&info->dom_ids_lock, flag); } xfree(d->arch.psr_cos_ids); @@ -1453,6 +1424,7 @@ static void psr_cpu_init(void) goto assoc_init; spin_lock_init(&info->ref_lock); + spin_lock_init(&info->dom_ids_lock); cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ®s); if ( regs.b & PSR_RESOURCE_TYPE_L3 )