diff mbox

[RFC,01/16] docs: create Memory Bandwidth Allocation (MBA) feature document.

Message ID 1484034155-4521-2-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun Jan. 10, 2017, 7:42 a.m. UTC
This patch creates MBA feature document in doc/features/. It
describes details for MBA.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
 docs/features/intel_psr_mba.pandoc | 226 +++++++++++++++++++++++++++++++++++++
 1 file changed, 226 insertions(+)
 create mode 100644 docs/features/intel_psr_mba.pandoc

Comments

Meng Xu Feb. 23, 2017, 8:46 p.m. UTC | #1
Hi Yi,

I have some quick comment about this document. Some minor points are
not very clear, IMHO.

> +
> +  2. `psr-mba-set [OPTIONS] domain-id throttling`:
> +
> +     Set memory bandwidth throttling for domain.
> +
> +     Options:
> +     '-s': Specify the socket to process, otherwise all sockets are processed.
> +
> +     Throttling value set in register implies memory bandwidth blocked, i.e.
> +     higher throttling value results in lower bandwidth. The max throttling
> +     value can be got through CPUID.
> +
> +     The response of the throttling value could be linear mode or non-linear
> +     mode.
> +
> +     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
> +     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
> +     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
> +     delay applied) by HW automatically.

So MBA has a minimum allocation unit. What is the minimum bandwidth
allocation unit?
From the above example, I had the impression that the allocation unit is 10%.
As mentioned in the document later, the throttle value is set in the
COS register's  Thrtl bit fields as shown in [Code_CBM, Data_CBM,
Thrtl]. I had the impression that the maximum number of bandwidth
units we can allocate is 2^number_of_bits_in_Thrtl.
Only one of my impression could be true, right? ;-)

In addition, since hardware will round down the partial bandwidth
value, why shouldn't we just allow system operators to configure the
"valid" bandwidth supported by the hardware.
For example, if the hardware only supports the bandwidth  throttle
value in 10% units, then we should not allow users to input the
bandwidth throttle value as 12% or 13%. Otherwise, as a system
operator, I would be confused at why I increased the bandwidth
throttle value from 11% to 19%, I still see the same bandwidth
guarantee.

> +
> +     Non-linear mode: input delay values are powers-of-two from zero to the
> +     MBA_MAX value from CPUID. In this case any values not a power of two will
> +     be rounded down the next nearest power of two by HW automatically.\

First question: Why is it the delay value instead of bandwidth value
in the non-linear mode? Does MBA really control memory access latency?

Second question: Does the hardware provide any guaranteed bandwidth in
the non-linear mode?
I saw the document patch in Linux at
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1307176.html:
[Qutoe]
In nonlinear scale currently SDM specifies
+throttle values in 2^n values. However the h/w does not guarantee a
+specific curve for the amount of memory b/w that is actually throttled.
+But for any thrtl_by value x > y, its guaranteed that x would throttle
+more b/w than y.  The info directory specifies the max thrtl_by value
+and thrtl_by granularity.
[/Qutoe]

It seems that the non-linear mode simply provide some throttling
relations but don't guarantee the actual throttle value.
Maybe it will be good to clearly state the capability and limitations
of the hardware.

> +  System administrator can change PSR allocation policy at runtime by
> +  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
> +  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
> +  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
> +  neither CAT nor CDP is enabled, things would be easier, one COS
> +  corresponds to one Thrtl.

How many bits in Thrtl field?
Is it decided by the hardware type?

> +# References
> +
> +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> +

I checked the document. The CAT is in Chapter 17.17. However, there is
no description about the MBA? ;-)

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Yi Sun Feb. 24, 2017, 5:07 a.m. UTC | #2
On 17-02-23 15:46:36, Meng Xu wrote:
> Hi Yi,
> 
> I have some quick comment about this document. Some minor points are
> not very clear, IMHO.
> 
Thanks for your mail!

> > +
> > +  2. `psr-mba-set [OPTIONS] domain-id throttling`:
> > +
> > +     Set memory bandwidth throttling for domain.
> > +
> > +     Options:
> > +     '-s': Specify the socket to process, otherwise all sockets are processed.
> > +
> > +     Throttling value set in register implies memory bandwidth blocked, i.e.
> > +     higher throttling value results in lower bandwidth. The max throttling
> > +     value can be got through CPUID.
> > +
> > +     The response of the throttling value could be linear mode or non-linear
> > +     mode.
> > +
> > +     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
> > +     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
> > +     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
> > +     delay applied) by HW automatically.
> 
> So MBA has a minimum allocation unit. What is the minimum bandwidth
> allocation unit?
> >From the above example, I had the impression that the allocation unit is 10%.
> As mentioned in the document later, the throttle value is set in the
> COS register's  Thrtl bit fields as shown in [Code_CBM, Data_CBM,
> Thrtl]. I had the impression that the maximum number of bandwidth
> units we can allocate is 2^number_of_bits_in_Thrtl.
> Only one of my impression could be true, right? ;-)
> 
MBA supports two modes by design. One is linear mode which likes 10%. The other
is non-linear mode which is power of 2. So, it depends on the HW info to see
which mode is supported. :)

> In addition, since hardware will round down the partial bandwidth
> value, why shouldn't we just allow system operators to configure the
> "valid" bandwidth supported by the hardware.
> For example, if the hardware only supports the bandwidth  throttle
> value in 10% units, then we should not allow users to input the
> bandwidth throttle value as 12% or 13%. Otherwise, as a system
> operator, I would be confused at why I increased the bandwidth
> throttle value from 11% to 19%, I still see the same bandwidth
> guarantee.
> 
That is an option to implement libxl or even upper layer. 

> > +
> > +     Non-linear mode: input delay values are powers-of-two from zero to the
> > +     MBA_MAX value from CPUID. In this case any values not a power of two will
> > +     be rounded down the next nearest power of two by HW automatically.\
> 
> First question: Why is it the delay value instead of bandwidth value
> in the non-linear mode? Does MBA really control memory access latency?
> 
MBA directly controls the latency to indirectly control the bandwidth.
You can see the description in SDM:
"The Memory Bandwidth Allocation (MBA) feature provides indirect and
approximate control over memory band width available per-core"

> Second question: Does the hardware provide any guaranteed bandwidth in
> the non-linear mode?
Nope, as above mentions, "approximate control" is provided no matter linear
mode or non-linear mode.

> I saw the document patch in Linux at
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1307176.html:
> [Qutoe]
> In nonlinear scale currently SDM specifies
> +throttle values in 2^n values. However the h/w does not guarantee a
> +specific curve for the amount of memory b/w that is actually throttled.
> +But for any thrtl_by value x > y, its guaranteed that x would throttle
> +more b/w than y.  The info directory specifies the max thrtl_by value
> +and thrtl_by granularity.
> [/Qutoe]
> 
> It seems that the non-linear mode simply provide some throttling
> relations but don't guarantee the actual throttle value.
> Maybe it will be good to clearly state the capability and limitations
> of the hardware.
> 
Sorry, there is no such info in SDM. But you can use MBM (Memory Bandwidth
Monitoring) feature to learn the MBA real status.

> > +  System administrator can change PSR allocation policy at runtime by
> > +  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
> > +  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
> > +  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
> > +  neither CAT nor CDP is enabled, things would be easier, one COS
> > +  corresponds to one Thrtl.
> 
> How many bits in Thrtl field?
> Is it decided by the hardware type?
> 
This is defined in SDM.
"The definition for the MBA delay value MSRs is provided in Figure 17.39. The
lower 16 bits are used for MBA delay values, and values from zero to the maximum
from the CPUID MBA_MAX-1 value are supported."

Please note, MBA value is different with CBM. You do not need care the bits.

> > +# References
> > +
> > +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> > +
> 
> I checked the document. The CAT is in Chapter 17.17. However, there is
> no description about the MBA? ;-)
Have you downloaded latest SDM? 17.18.7 is for MBA.

> 
> Thanks,
> 
> Meng
> 
> -----------
> Meng Xu
> PhD Student in Computer and Information Science
> University of Pennsylvania
> http://www.cis.upenn.edu/~mengxu/
Meng Xu Feb. 24, 2017, 3:53 p.m. UTC | #3
>> > +  System administrator can change PSR allocation policy at runtime by
>> > +  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
>> > +  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
>> > +  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
>> > +  neither CAT nor CDP is enabled, things would be easier, one COS
>> > +  corresponds to one Thrtl.
>>
>> How many bits in Thrtl field?
>> Is it decided by the hardware type?
>>
> This is defined in SDM.
> "The definition for the MBA delay value MSRs is provided in Figure 17.39. The
> lower 16 bits are used for MBA delay values, and values from zero to the maximum
> from the CPUID MBA_MAX-1 value are supported."
>
> Please note, MBA value is different with CBM. You do not need care the bits.
>
>> > +# References
>> > +
>> > +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
>> > +
>>
>> I checked the document. The CAT is in Chapter 17.17. However, there is
>> no description about the MBA? ;-)
> Have you downloaded latest SDM? 17.18.7 is for MBA.

Ah-ha, I saw it now. I guess I downloaded the old version. :-)

I found this MBA feature is interesting. Is there any processor on the
market we can purchase?
We'd like to evaluate this feature. ;-)

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Yi Sun Feb. 27, 2017, 4:39 a.m. UTC | #4
On 17-02-24 10:53:16, Meng Xu wrote:
> >> > +# References
> >> > +
> >> > +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> >> > +
> >>
> >> I checked the document. The CAT is in Chapter 17.17. However, there is
> >> no description about the MBA? ;-)
> > Have you downloaded latest SDM? 17.18.7 is for MBA.
> 
> Ah-ha, I saw it now. I guess I downloaded the old version. :-)
> 
> I found this MBA feature is interesting. Is there any processor on the
> market we can purchase?
> We'd like to evaluate this feature. ;-)
> 
Per my info, Skylake which enables the MBA will be released in the middle of
this year.

> Thanks,
> 
> Meng
> 
> -----------
> Meng Xu
> PhD Student in Computer and Information Science
> University of Pennsylvania
> http://www.cis.upenn.edu/~mengxu/
diff mbox

Patch

diff --git a/docs/features/intel_psr_mba.pandoc b/docs/features/intel_psr_mba.pandoc
new file mode 100644
index 0000000..8c04f01
--- /dev/null
+++ b/docs/features/intel_psr_mba.pandoc
@@ -0,0 +1,226 @@ 
+% Intel Memory Bandwidth Allocation (MBA) Feature
+% Revision 1.0
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Tech Preview**
+
+Architecture(s): Intel x86
+
+   Component(s): Hypervisor, toolstack
+
+       Hardware: MBA is supported on Skylake Server and beyond
+---------------- ----------------------------------------------------
+
+# Overview
+
+The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
+control over memory bandwidth available per-core. This feature provides OS/VMMs
+the ability to slow misbehaving apps/VMs or create advanced closed-loop control
+system via exposing control over a credit-based throttling mechanism.
+
+## Terminology
+
+* CAT         Cache Allocation Technology
+* COS/CLOS    Class of Service
+* MSRs        Machine Specific Registers
+* PSR         Intel Platform Shared Resource
+* VMM         Virtual Machine Monitor
+* THRTL       Throttle value or delay value
+
+# User details
+
+* Feature Enabling:
+
+  Add "psr=mba" to boot line parameter to enable MBA feature.
+
+* xl interfaces:
+
+  1. `psr-mba-show [domain-id]`:
+
+     Show system/domain MBA information.
+
+  2. `psr-mba-set [OPTIONS] domain-id throttling`:
+
+     Set memory bandwidth throttling for domain.
+
+     Options:
+     '-s': Specify the socket to process, otherwise all sockets are processed.
+
+     Throttling value set in register implies memory bandwidth blocked, i.e.
+     higher throttling value results in lower bandwidth. The max throttling
+     value can be got through CPUID.
+
+     The response of the throttling value could be linear mode or non-linear
+     mode.
+
+     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
+     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
+     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
+     delay applied) by HW automatically.
+
+     Non-linear mode: input delay values are powers-of-two from zero to the
+     MBA_MAX value from CPUID. In this case any values not a power of two will
+     be rounded down the next nearest power of two by HW automatically.
+
+# Technical details
+
+MBA is a member of Intel PSR features, it would share some base PSR
+infrastructure in Xen.
+
+## Hardware perspective
+
+MBA provides an architectural consistent method to map cores’ to a Class
+of Service (COS). This infrastructure will be shared with the previously
+introduced CAT technologies.
+
+Furthermore, MBA also defines a new range MSRs to support specifying a
+delay value (Thrtl) per COS, with details below.
+
++----------------------------+----------------+
+| MSR (per socket)           |    Address     |
++----------------------------+----------------+
+| IA32_L2_QOS_Ext_BW_Thrtl_0 |     0xD50      |
++----------------------------+----------------+
+| ...                        |  ...           |
++----------------------------+----------------+
+| IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n (n<64) |
++----------------------------+----------------+
+
+When context switch happens, the COS of VCPU is written to per-thread
+MSR `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation
+according to the throttling value corresponding to the COS.
+
+## The relationship between MBA and CAT/CDP
+
+Generally speaking, MBA is completely independent of CAT/CDP, and any
+combination may be applied at any time, e.g. enabling MBA with CAT 
+disabled.
+
+But it needs to be noticed that MBA shares COS infrastructure with CAT,
+although MBA is enumerated by different CPUID leaf from CAT (which
+indicates that the max COS of MBA may be different from CAT).
+
+## Design Overview
+
+* Core COS/Thrtl association
+
+  When enforcing Memory Bandwidth Allocation, all cores of domains have
+  the same default COS (COS0) which correspond to the same Thrtl (0).
+  The default COS is used only in hypervisor and is transparent to tool
+  stack and user.
+
+  System administrator can change PSR allocation policy at runtime by
+  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
+  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enable,
+  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
+  neither CAT nor CDP is enabled, things would be easier, one COS
+  corresponds to one Thrtl.
+
+* VCPU schedule
+
+  This part reuses CAT COS infrastructure.
+
+* Multi-sockets
+
+  Different sockets may have different MBA ability (like max COS)
+  although it is consistent on the same socket. So the capability
+  of per-socket MBA is specified.
+
+## Implementation Description
+
+* Hypervisor interfaces:
+
+  1. Boot line param: "psr=mba" to enable the feature.
+
+  2. SYSCTL:
+          - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information.
+
+  3. DOMCTL:
+          - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get Throttling for a domain.
+          - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set Throttling for a domain.
+
+* xl interfaces:
+
+  1. psr-mba-show [domain-id]
+          Show system/runtime MBA information.
+          => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL
+
+  2. psr-mba-set [OPTIONS] domain-id throttling
+          Set bandwidth throttling for a domain.
+          => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL
+
+* Key data structure:
+
+  1. Feature HW info
+
+     ```
+     struct psr_mba_hw_info {
+         unsigned int thrtl_max;
+         unsigned int cos_max;
+         unsigned int linear;
+     };
+
+     - Member `thrtl_max`
+
+       `thrtl_max` is the max throttling value to be set.
+
+     - Member `cos_max`
+
+       `cos_max` is one of the hardware info of CAT.
+
+     - Member `linear`
+
+       `thrtl_max` means the response of delay value is linear or not.
+
+     As mentioned above, MBA is a member of Intel PSR features, it would
+     share some base PSR infrastructure in Xen. So, for other data structure
+     details, please refer 'intel_psr_l2_cat.pandoc'.
+
+# Limitations
+
+MBA can only work on HW which enables it (check by CPUID).
+
+# Testing
+
+We can execute these commands to verify MBA on different HWs supporting them.
+
+For example:
+    root@:~$ xl psr-hwinfo --mba
+    Memory Bandwidth Allocation (MBA):
+    Socket ID       : 0
+    Linear Mode     : Enabled
+    Maximum COS     : 7
+    Maximum Throttling Value: 90
+    Default Throttling Value: 0
+
+    root@:~$ xl psr-mba-set 1 0xa
+
+    root@:~$ xl psr-mba-show 1
+    Socket ID       : 0
+    Default THRTL   : 0
+       ID                     NAME             CBM
+        1                 ubuntu14             0xa
+
+# Areas for improvement
+
+N/A
+
+# Known issues
+
+N/A
+
+# References
+
+"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2017-01-10 1.0      Xen 4.9  Design document written
+---------- -------- -------- -------------------------------------------