Message ID | 1504603957-5389-2-git-send-email-yi.y.sun@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: > +* xl interfaces: > + > + 1. `psr-mba-show [domain-id]`: Is this limited to domain-id, or one can also use the domain name? Most of the xl commands accept either a domain-id or a domain-name. > + > + Show memory bandwidth throttling for domain. Under different modes, it > + shows different type of data. > + > + There are two modes: > + Linear mode: the response of throttling value is linear. > + Non-linear mode: the response of throttling value is non-linear. > + > + For linear mode, it shows the decimal value. For non-linear mode, it shows > + hexadecimal value. > + > + 2. `psr-mba-set [OPTIONS] <domain-id> <throttling>`: > + > + Set memory bandwidth throttling for domain. > + > + Options: > + '-s': Specify the socket to process, otherwise all sockets are processed. > + > + Throttling value set in register implies the approximate amount of delaying > + the traffic between core and memory. The higher throttling value results in > + lower bandwidth. The max throttling value (MBA_MAX) supported can be got s/got/obtained/ > + through CPUID. How can one get this value empirically? Do I need to use a external tool? > + > + Linear mode: the input precision is defined as 100-(MBA_MAX). For instance, > + if the MBA_MAX value is 90, the input precision is 10%. Values not an even > + multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10% > + delay applied) by HW automatically. > + > + Non-linear mode: input delay values are powers-of-two from zero to the > + MBA_MAX value from CPUID. In this case any values not a power of two will > + be rounded down the next nearest power of two by HW automatically. Both of the above descriptions should be moved to mba-show IMHO, the description there is incomplete and not helpful. > + > +# Technical details > + > +MBA is a member of Intel PSR features, it shares the base PSR infrastructure > +in Xen. > + > +## Hardware perspective > + > + MBA defines a range of MSRs to support specifying a delay value (Thrtl) per > + COS, with details below. > + > + ``` > + +----------------------------+----------------+ > + | MSR (per socket) | Address | > + +----------------------------+----------------+ > + | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | > + +----------------------------+----------------+ > + | ... | ... | > + +----------------------------+----------------+ > + | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | > + +----------------------------+----------------+ > + ``` > + > + When context switch happens, the COS ID of domain is written to per-thread MSR > + `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according I think this is missing some context of the relation between a thread and the MSR. I assume it's related to IA32_PQR_ASSOC, but I have no idea what that constant means. What's more, Xen doesn't have threads, so you should maybe speak about vCPUs instead? > + to the throttling value stored in the Thrtl MSR register. > + > +## The relationship between MBA and CAT/CDP > + > + Generally speaking, MBA is completely independent of CAT/CDP, and any > + combination may be applied at any time, e.g. enabling MBA with CAT > + disabled. > + > + But it needs to be noticed that MBA shares COS infrastructure with CAT, > + although MBA is enumerated by different CPUID leaf from CAT (which > + indicates that the max COS of MBA may be different from CAT). In some > + cases, a domain is permitted to have a COS that is beyond one (or more) > + of PSR features but within the others. For instance, let's assume the max > + COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned > + 9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA, > + the HW works as default value is set since COS 9 is beyond the max COS (8) > + of MBA. > + > +## Design Overview > + > +* Core COS/Thrtl association > + > + When enforcing Memory Bandwidth Allocation, all cores of domains have > + the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The > + default Thrtl MSR is used only in hypervisor and is transparent to tool stack > + and user. > + > + System administrators can change PSR allocation policy at runtime by > + using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID > + corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP > + is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM, > + Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS > + ID corresponds to one Thrtl. > + > +* VCPU schedule > + > + This part reuses CAT COS infrastructure. > + > +* Multi-sockets > + > + Different sockets may have different MBA ability (like max COS) > + although it is consistent on the same socket. So the capability > + of per-socket MBA is specified. > + > + This part reuses CAT COS infrastructure. > + > +## Implementation Description > + > +* Hypervisor interfaces: > + > + 1. Boot line param: "psr=mba" to enable the feature. > + > + 2. SYSCTL: > + - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information. So this is likely how one gets the mentioned MBA_MAX? > + > + 3. DOMCTL: > + - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. > + - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. > + > +* xl interfaces: > + > + 1. psr-mba-show [domain-id] > + Show system/domain runtime MBA throttling value. For linear mode, > + it shows the decimal value. For non-linear mode, it shows hexadecimal > + value. > + => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL > + > + 2. psr-mba-set [OPTIONS] <domain-id> <throttling> > + Set bandwidth throttling for a domain. > + => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL > + > + 3. psr-hwinfo > + Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. > + => XEN_SYSCTL_PSR_MBA_get_info 'psr-hwinfo' seems to be completely missing from the 'xl interfaces:' section above. > +* Key data structure: > + > + 1. Feature HW info > + > + ``` > + struct { > + unsigned int thrtl_max; > + bool linear; > + } mba; > + > + - Member `thrtl_max` > + > + `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX. > + > + - Member `linear` > + > + `linear` means the response of delay value is linear or not. > + > + As mentioned above, MBA is a member of Intel PSR features, it would > + share the base PSR infrastructure in Xen. For example, the 'cos_max' > + is a common HW property for all features. So, for other data structure > + details, please refer 'intel_psr_cat_cdp.pandoc'. ^ to > + > +# Limitations > + > +MBA can only work on HW which enables it (check by CPUID). ^ s/enables/supports/. > + > +# Testing > + > +We can execute these commands to verify MBA on different HWs supporting them. > + > +For example: > + 1. User can get the MBA hardware info through 'psr-hwinfo' command. From > + result, user can know if this hardware works under linear mode or non- > + linear mode, the max throttling value (MBA_MAX) and so on. > + > + root@:~$ xl psr-hwinfo --mba > + Memory Bandwidth Allocation (MBA): > + Socket ID : 0 > + Linear Mode : Enabled > + Maximum COS : 7 > + Maximum Throttling Value: 90 > + Default Throttling Value: 0 > + > + 2. Then, user can set a throttling value to a domain. For example, set '0xa', > + i.e 10% delay. > + > + root@:~$ xl psr-mba-set 1 0xa > + > + 3. User can check the current configuration of the domain through > + 'psr-mab-show'. For linear mode, the decimal value is shown. > + > + root@:~$ xl psr-mba-show 1 > + Socket ID : 0 > + Default THRTL : 0 > + ID NAME THRTL > + 1 ubuntu14 10 The example seems better now IMHO. Thanks, Roger.
>>> Roger Pau Monné <roger.pau@citrix.com> 09/18/17 7:21 PM >>> >On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: >> +## Hardware perspective >> + >> + MBA defines a range of MSRs to support specifying a delay value (Thrtl) per >> + COS, with details below. >> + >> + ``` >> + +----------------------------+----------------+ >> + | MSR (per socket) | Address | >> + +----------------------------+----------------+ >> + | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | >> + +----------------------------+----------------+ >> + | ... | ... | >> + +----------------------------+----------------+ >> + | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | >> + +----------------------------+----------------+ >> + ``` >> + >> + When context switch happens, the COS ID of domain is written to per-thread MSR >> + `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according > >I think this is missing some context of the relation between a thread >and the MSR. I assume it's related to IA32_PQR_ASSOC, but I have no >idea what that constant means. > >What's more, Xen doesn't have threads, so you should maybe speak about >vCPUs instead? I think talk is of hardware aspects here, i.e. "thread" as in "hyper-thread". Jan
On 17-09-19 00:07:36, Jan Beulich wrote: > >>> Roger Pau Monné <roger.pau@citrix.com> 09/18/17 7:21 PM >>> > >On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: > >> +## Hardware perspective > >> + > >> + MBA defines a range of MSRs to support specifying a delay value (Thrtl) per > >> + COS, with details below. > >> + > >> + ``` > >> + +----------------------------+----------------+ > >> + | MSR (per socket) | Address | > >> + +----------------------------+----------------+ > >> + | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | > >> + +----------------------------+----------------+ > >> + | ... | ... | > >> + +----------------------------+----------------+ > >> + | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | > >> + +----------------------------+----------------+ > >> + ``` > >> + > >> + When context switch happens, the COS ID of domain is written to per-thread MSR > >> + `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according > > > >I think this is missing some context of the relation between a thread > >and the MSR. I assume it's related to IA32_PQR_ASSOC, but I have no > >idea what that constant means. > > > >What's more, Xen doesn't have threads, so you should maybe speak about > >vCPUs instead? > > I think talk is of hardware aspects here, i.e. "thread" as in "hyper-thread". > > Jan > Indeed. Will make it more clear. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On 17-09-18 18:16:40, Roger Pau Monn� wrote: > On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: > > +* xl interfaces: > > + > > + 1. `psr-mba-show [domain-id]`: > > Is this limited to domain-id, or one can also use the domain name? > Most of the xl commands accept either a domain-id or a domain-name. > Both domain-id and domain-name can show it. I thought this is by default and no need to explicitly declare. If I am wrong, I will change it as below: `psr-mba-show [domain-id/domain-name]` [...] > > + 2. `psr-mba-set [OPTIONS] <domain-id> <throttling>`: > > + > > + Set memory bandwidth throttling for domain. > > + > > + Options: > > + '-s': Specify the socket to process, otherwise all sockets are processed. > > + > > + Throttling value set in register implies the approximate amount of delaying > > + the traffic between core and memory. The higher throttling value results in > > + lower bandwidth. The max throttling value (MBA_MAX) supported can be got > > s/got/obtained/ > Thanks! > > + through CPUID. > > How can one get this value empirically? Do I need to use a external > tool? > Sorry for confusion. In fact, the MBA_MAX is got through CPUID in hypervisor. User can know it through psr-hwinfo. Will explain it. > > + > > + Linear mode: the input precision is defined as 100-(MBA_MAX). For instance, > > + if the MBA_MAX value is 90, the input precision is 10%. Values not an even > > + multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10% > > + delay applied) by HW automatically. > > + > > + Non-linear mode: input delay values are powers-of-two from zero to the > > + MBA_MAX value from CPUID. In this case any values not a power of two will > > + be rounded down the next nearest power of two by HW automatically. > > Both of the above descriptions should be moved to mba-show IMHO, the > description there is incomplete and not helpful. > Ok, thanks! > > + > > +# Technical details > > + > > +MBA is a member of Intel PSR features, it shares the base PSR infrastructure > > +in Xen. > > + > > +## Hardware perspective > > + > > + MBA defines a range of MSRs to support specifying a delay value (Thrtl) per > > + COS, with details below. > > + > > + ``` > > + +----------------------------+----------------+ > > + | MSR (per socket) | Address | > > + +----------------------------+----------------+ > > + | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | > > + +----------------------------+----------------+ > > + | ... | ... | > > + +----------------------------+----------------+ > > + | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | > > + +----------------------------+----------------+ > > + ``` > > + > > + When context switch happens, the COS ID of domain is written to per-thread MSR > > + `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according > > I think this is missing some context of the relation between a thread > and the MSR. I assume it's related to IA32_PQR_ASSOC, but I have no > idea what that constant means. > > What's more, Xen doesn't have threads, so you should maybe speak about > vCPUs instead? > As Jan's comment, this is for 'per-hyper-thread'. [...] > > +## Implementation Description > > + > > +* Hypervisor interfaces: > > + > > + 1. Boot line param: "psr=mba" to enable the feature. > > + > > + 2. SYSCTL: > > + - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information. > > So this is likely how one gets the mentioned MBA_MAX? > Yup. > > + > > + 3. DOMCTL: > > + - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. > > + - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. > > + > > +* xl interfaces: > > + > > + 1. psr-mba-show [domain-id] > > + Show system/domain runtime MBA throttling value. For linear mode, > > + it shows the decimal value. For non-linear mode, it shows hexadecimal > > + value. > > + => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL > > + > > + 2. psr-mba-set [OPTIONS] <domain-id> <throttling> > > + Set bandwidth throttling for a domain. > > + => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL > > + > > + 3. psr-hwinfo > > + Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. > > + => XEN_SYSCTL_PSR_MBA_get_info > > 'psr-hwinfo' seems to be completely missing from the 'xl interfaces:' > section above. > Because this is not a newly added interface, I do not describe it in 'xl interfaces'. Is that necessary? > > +* Key data structure: > > + > > + 1. Feature HW info > > + > > + ``` > > + struct { > > + unsigned int thrtl_max; > > + bool linear; > > + } mba; > > + > > + - Member `thrtl_max` > > + > > + `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX. > > + > > + - Member `linear` > > + > > + `linear` means the response of delay value is linear or not. > > + > > + As mentioned above, MBA is a member of Intel PSR features, it would > > + share the base PSR infrastructure in Xen. For example, the 'cos_max' > > + is a common HW property for all features. So, for other data structure > > + details, please refer 'intel_psr_cat_cdp.pandoc'. > ^ to Thanks! > > + > > +# Limitations > > + > > +MBA can only work on HW which enables it (check by CPUID). > ^ s/enables/supports/. Thanks! > > + > > +# Testing > > + > > +We can execute these commands to verify MBA on different HWs supporting them. > > + > > +For example: > > + 1. User can get the MBA hardware info through 'psr-hwinfo' command. From > > + result, user can know if this hardware works under linear mode or non- > > + linear mode, the max throttling value (MBA_MAX) and so on. > > + > > + root@:~$ xl psr-hwinfo --mba > > + Memory Bandwidth Allocation (MBA): > > + Socket ID : 0 > > + Linear Mode : Enabled > > + Maximum COS : 7 > > + Maximum Throttling Value: 90 > > + Default Throttling Value: 0 > > + > > + 2. Then, user can set a throttling value to a domain. For example, set '0xa', > > + i.e 10% delay. > > + > > + root@:~$ xl psr-mba-set 1 0xa > > + > > + 3. User can check the current configuration of the domain through > > + 'psr-mab-show'. For linear mode, the decimal value is shown. > > + > > + root@:~$ xl psr-mba-show 1 > > + Socket ID : 0 > > + Default THRTL : 0 > > + ID NAME THRTL > > + 1 ubuntu14 10 > > The example seems better now IMHO. > > Thanks, Roger.
On Wed, Sep 20, 2017 at 11:06:57AM +0800, Yi Sun wrote: > On 17-09-18 18:16:40, Roger Pau Monn� wrote: > > On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: > > > +* xl interfaces: > > > + > > > + 1. `psr-mba-show [domain-id]`: > > > > Is this limited to domain-id, or one can also use the domain name? > > Most of the xl commands accept either a domain-id or a domain-name. > > > Both domain-id and domain-name can show it. I thought this is by default and > no need to explicitly declare. If I am wrong, I will change it as below: > `psr-mba-show [domain-id/domain-name]` [domain-id|domain-name] Would be better IMHO. > > > + > > > + 3. DOMCTL: > > > + - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. > > > + - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. > > > + > > > +* xl interfaces: > > > + > > > + 1. psr-mba-show [domain-id] > > > + Show system/domain runtime MBA throttling value. For linear mode, > > > + it shows the decimal value. For non-linear mode, it shows hexadecimal > > > + value. > > > + => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL > > > + > > > + 2. psr-mba-set [OPTIONS] <domain-id> <throttling> > > > + Set bandwidth throttling for a domain. > > > + => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL > > > + > > > + 3. psr-hwinfo > > > + Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. > > > + => XEN_SYSCTL_PSR_MBA_get_info > > > > 'psr-hwinfo' seems to be completely missing from the 'xl interfaces:' > > section above. > > > Because this is not a newly added interface, I do not describe it in 'xl > interfaces'. Is that necessary? Oh, OK, sorry for the noise. Then I guess it's not necessary to describe it here. Maybe a reference to where 'psr-hwinfo' is described would be nice (I assume there's a feature document somewhere that describes 'psr-hwinfo'). Roger.
On 17-09-20 09:36:13, Roger Pau Monn� wrote: > On Wed, Sep 20, 2017 at 11:06:57AM +0800, Yi Sun wrote: > > On 17-09-18 18:16:40, Roger Pau Monn� wrote: > > > On Tue, Sep 05, 2017 at 05:32:23PM +0800, Yi Sun wrote: > > > > +* xl interfaces: > > > > + > > > > + 1. `psr-mba-show [domain-id]`: > > > > > > Is this limited to domain-id, or one can also use the domain name? > > > Most of the xl commands accept either a domain-id or a domain-name. > > > > > Both domain-id and domain-name can show it. I thought this is by default and > > no need to explicitly declare. If I am wrong, I will change it as below: > > `psr-mba-show [domain-id/domain-name]` > > [domain-id|domain-name] > > Would be better IMHO. > Thanks! > > > > + > > > > + 3. DOMCTL: > > > > + - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. > > > > + - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. > > > > + > > > > +* xl interfaces: > > > > + > > > > + 1. psr-mba-show [domain-id] > > > > + Show system/domain runtime MBA throttling value. For linear mode, > > > > + it shows the decimal value. For non-linear mode, it shows hexadecimal > > > > + value. > > > > + => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL > > > > + > > > > + 2. psr-mba-set [OPTIONS] <domain-id> <throttling> > > > > + Set bandwidth throttling for a domain. > > > > + => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL > > > > + > > > > + 3. psr-hwinfo > > > > + Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. > > > > + => XEN_SYSCTL_PSR_MBA_get_info > > > > > > 'psr-hwinfo' seems to be completely missing from the 'xl interfaces:' > > > section above. > > > > > Because this is not a newly added interface, I do not describe it in 'xl > > interfaces'. Is that necessary? > > Oh, OK, sorry for the noise. Then I guess it's not necessary to > describe it here. Maybe a reference to where 'psr-hwinfo' is described > would be nice (I assume there's a feature document somewhere that > describes 'psr-hwinfo'). > psr-hwinfo is firstly introduced in intel_psr_cat_cdp.pandoc. But MBA feature adds a new sysctl interface 'XEN_SYSCTL_PSR_MBA_get_info' which is used by psr-hwinfo. So, I describe it here again. > Roger.
diff --git a/docs/features/intel_psr_mba.pandoc b/docs/features/intel_psr_mba.pandoc new file mode 100644 index 0000000..693ef45 --- /dev/null +++ b/docs/features/intel_psr_mba.pandoc @@ -0,0 +1,283 @@ +% Intel Memory Bandwidth Allocation (MBA) Feature +% Revision 1.5 + +\clearpage + +# Basics + +---------------- ---------------------------------------------------- + Status: **Tech Preview** + +Architecture(s): Intel x86 + + Component(s): Hypervisor, toolstack + + Hardware: MBA is supported on Skylake Server and beyond +---------------- ---------------------------------------------------- + +# Terminology + +* CAT Cache Allocation Technology +* CBM Capacity BitMasks +* CDP Code and Data Prioritization +* COS/CLOS Class of Service +* HW Hardware +* MBA Memory Bandwidth Allocation +* MSRs Machine Specific Registers +* PSR Intel Platform Shared Resource +* THRTL Throttle value or delay value + +# Overview + +The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate +control over memory bandwidth available per-core. This feature provides OS/ +hypervisor the ability to slow misbehaving apps/domains by using a credit-based +throttling mechanism. + +# User details + +* Feature Enabling: + + Add "psr=mba" to boot line parameter to enable MBA feature. + +* xl interfaces: + + 1. `psr-mba-show [domain-id]`: + + Show memory bandwidth throttling for domain. Under different modes, it + shows different type of data. + + There are two modes: + Linear mode: the response of throttling value is linear. + Non-linear mode: the response of throttling value is non-linear. + + For linear mode, it shows the decimal value. For non-linear mode, it shows + hexadecimal value. + + 2. `psr-mba-set [OPTIONS] <domain-id> <throttling>`: + + Set memory bandwidth throttling for domain. + + Options: + '-s': Specify the socket to process, otherwise all sockets are processed. + + Throttling value set in register implies the approximate amount of delaying + the traffic between core and memory. The higher throttling value results in + lower bandwidth. The max throttling value (MBA_MAX) supported can be got + through CPUID. + + Linear mode: the input precision is defined as 100-(MBA_MAX). For instance, + if the MBA_MAX value is 90, the input precision is 10%. Values not an even + multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10% + delay applied) by HW automatically. + + Non-linear mode: input delay values are powers-of-two from zero to the + MBA_MAX value from CPUID. In this case any values not a power of two will + be rounded down the next nearest power of two by HW automatically. + +# Technical details + +MBA is a member of Intel PSR features, it shares the base PSR infrastructure +in Xen. + +## Hardware perspective + + MBA defines a range of MSRs to support specifying a delay value (Thrtl) per + COS, with details below. + + ``` + +----------------------------+----------------+ + | MSR (per socket) | Address | + +----------------------------+----------------+ + | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | + +----------------------------+----------------+ + | ... | ... | + +----------------------------+----------------+ + | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | + +----------------------------+----------------+ + ``` + + When context switch happens, the COS ID of domain is written to per-thread MSR + `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according + to the throttling value stored in the Thrtl MSR register. + +## The relationship between MBA and CAT/CDP + + Generally speaking, MBA is completely independent of CAT/CDP, and any + combination may be applied at any time, e.g. enabling MBA with CAT + disabled. + + But it needs to be noticed that MBA shares COS infrastructure with CAT, + although MBA is enumerated by different CPUID leaf from CAT (which + indicates that the max COS of MBA may be different from CAT). In some + cases, a domain is permitted to have a COS that is beyond one (or more) + of PSR features but within the others. For instance, let's assume the max + COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned + 9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA, + the HW works as default value is set since COS 9 is beyond the max COS (8) + of MBA. + +## Design Overview + +* Core COS/Thrtl association + + When enforcing Memory Bandwidth Allocation, all cores of domains have + the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The + default Thrtl MSR is used only in hypervisor and is transparent to tool stack + and user. + + System administrators can change PSR allocation policy at runtime by + using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID + corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP + is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM, + Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS + ID corresponds to one Thrtl. + +* VCPU schedule + + This part reuses CAT COS infrastructure. + +* Multi-sockets + + Different sockets may have different MBA ability (like max COS) + although it is consistent on the same socket. So the capability + of per-socket MBA is specified. + + This part reuses CAT COS infrastructure. + +## Implementation Description + +* Hypervisor interfaces: + + 1. Boot line param: "psr=mba" to enable the feature. + + 2. SYSCTL: + - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information. + + 3. DOMCTL: + - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. + - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. + +* xl interfaces: + + 1. psr-mba-show [domain-id] + Show system/domain runtime MBA throttling value. For linear mode, + it shows the decimal value. For non-linear mode, it shows hexadecimal + value. + => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL + + 2. psr-mba-set [OPTIONS] <domain-id> <throttling> + Set bandwidth throttling for a domain. + => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL + + 3. psr-hwinfo + Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. + => XEN_SYSCTL_PSR_MBA_get_info + +* Key data structure: + + 1. Feature HW info + + ``` + struct { + unsigned int thrtl_max; + bool linear; + } mba; + + - Member `thrtl_max` + + `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX. + + - Member `linear` + + `linear` means the response of delay value is linear or not. + + As mentioned above, MBA is a member of Intel PSR features, it would + share the base PSR infrastructure in Xen. For example, the 'cos_max' + is a common HW property for all features. So, for other data structure + details, please refer 'intel_psr_cat_cdp.pandoc'. + +# Limitations + +MBA can only work on HW which enables it (check by CPUID). + +# Testing + +We can execute these commands to verify MBA on different HWs supporting them. + +For example: + 1. User can get the MBA hardware info through 'psr-hwinfo' command. From + result, user can know if this hardware works under linear mode or non- + linear mode, the max throttling value (MBA_MAX) and so on. + + root@:~$ xl psr-hwinfo --mba + Memory Bandwidth Allocation (MBA): + Socket ID : 0 + Linear Mode : Enabled + Maximum COS : 7 + Maximum Throttling Value: 90 + Default Throttling Value: 0 + + 2. Then, user can set a throttling value to a domain. For example, set '0xa', + i.e 10% delay. + + root@:~$ xl psr-mba-set 1 0xa + + 3. User can check the current configuration of the domain through + 'psr-mab-show'. For linear mode, the decimal value is shown. + + root@:~$ xl psr-mba-show 1 + Socket ID : 0 + Default THRTL : 0 + ID NAME THRTL + 1 ubuntu14 10 + +# Areas for improvement + +N/A + +# Known issues + +N/A + +# References + +"INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES" [Intel 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html) + +# History + +------------------------------------------------------------------------ +Date Revision Version Notes +---------- -------- -------- ------------------------------------------- +2017-01-10 1.0 Xen 4.9 Design document written +2017-07-10 1.1 Xen 4.10 Changes: + 1. Modify data structure according to latest + codes; + 2. Add content for 'Areas for improvement'; + 3. Other minor changes. +2017-08-09 1.2 Xen 4.10 Changes: + 1. Remove a special character to avoid error when + building pandoc. +2017-08-15 1.3 Xen 4.10 Changes: + 1. Add terminology 'HW'. + 2. Change 'COS ID of VCPU' to 'COS ID of domain'. + 3. Change 'COS register' to 'Thrtl MSR'. + 4. Explain the value shown for 'psr-mba-show' under + different modes. + 5. Remove content in 'Areas for improvement'. +2017-08-16 1.4 Xen 4.10 Changes: + 1. Add '<>' for mandatory argument. +2017-08-30 1.5 Xen 4.10 Changes: + 1. Modify words in 'Overview' to make it easier to + understand. + 2. Explain 'linear/non-linear' modes before mention + them. + 3. Explain throttling value more accurate. + 4. Explain 'MBA_MAX'. + 5. Correct some words in 'Design Overview'. + 6. Change 'mba_info' to 'mba' according to code + changes. Also, modify contents of it. + 7. Add context in 'Testing' part to make things + more clear. + 8. Remove 'n<64' to avoid out-of-sync. +---------- -------- -------- -------------------------------------------
This patch creates MBA feature document in doc/features/. It describes key points to implement MBA which is described in details in Intel SDM "Introduction to Memory Bandwidth Allocation". Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> --- v3: - remove 'closed-loop' related description. (suggested by Roger Pau Monné) - explain 'linear' and 'non-linear' before mentioning them. (suggested by Roger Pau Monné) - adjust desription of 'psr-mba-set'. (suggested by Roger Pau Monné) - explain 'MBA_MAX'. (suggested by Roger Pau Monné) - remove 'n<64'. (suggested by Roger Pau Monné) - fix some wordings. (suggested by Roger Pau Monné) - add context in 'Testing' part to make things more clear. (suggested by Roger Pau Monné) v2: - declare 'HW' in Terminology. (suggested by Chao Peng) - replace 'COS ID of VCPU' to 'COS ID of domain'. (suggested by Chao Peng) - replace 'COS register' to 'Thrtl MSR'. (suggested by Chao Peng) - add description for 'psr-mba-show' to state that the decimal value is shown for linear mode but hexadecimal value is shown for non-linear mode. (suggested by Chao Peng) - remove content in 'Areas for improvement'. (suggested by Chao Peng) - use '<>' to specify mandatory argument to a command. (suggested by Wei Liu) v1: - remove a special character to avoid the error when building pandoc. --- docs/features/intel_psr_mba.pandoc | 283 +++++++++++++++++++++++++++++++++++++ 1 file changed, 283 insertions(+) create mode 100644 docs/features/intel_psr_mba.pandoc