new file mode 100644
@@ -0,0 +1,244 @@
+% Intel Memory Bandwidth Allocation (MBA) Feature
+% Revision 1.1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+ Status: **Tech Preview**
+
+Architecture(s): Intel x86
+
+ Component(s): Hypervisor, toolstack
+
+ Hardware: MBA is supported on Skylake Server and beyond
+---------------- ----------------------------------------------------
+
+# Terminology
+
+* CAT Cache Allocation Technology
+* CBM Capacity BitMasks
+* CDP Code and Data Prioritization
+* COS/CLOS Class of Service
+* MBA Memory Bandwidth Allocation
+* MSRs Machine Specific Registers
+* PSR Intel Platform Shared Resource
+* THRTL Throttle value or delay value
+
+# Overview
+
+The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
+control over memory bandwidth available per-core. This feature provides OS/
+hypervisor the ability to slow misbehaving apps/domains or create advanced
+closed-loop control system via exposing control over a credit-based throttling
+mechanism.
+
+# User details
+
+* Feature Enabling:
+
+ Add "psr=mba" to boot line parameter to enable MBA feature.
+
+* xl interfaces:
+
+ 1. `psr-mba-show [domain-id]`:
+
+ Show memory bandwidth throttling for domain.
+
+ 2. `psr-mba-set [OPTIONS] domain-id throttling`:
+
+ Set memory bandwidth throttling for domain.
+
+ Options:
+ '-s': Specify the socket to process, otherwise all sockets are processed.
+
+ Throttling value set in register implies memory bandwidth blocked, i.e.
+ higher throttling value results in lower bandwidth. The max throttling
+ value can be got through CPUID.
+
+ The response of the throttling value could be linear mode or non-linear
+ mode.
+
+ Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
+ if the MBA_MAX value is 90, the input precision is 10%. Values not an even
+ multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
+ delay applied) by HW automatically.
+
+ Non-linear mode: input delay values are powers-of-two from zero to the
+ MBA_MAX value from CPUID. In this case any values not a power of two will
+ be rounded down the next nearest power of two by HW automatically.
+
+# Technical details
+
+MBA is a member of Intel PSR features, it shares the base PSR infrastructure
+in Xen.
+
+## Hardware perspective
+
+ MBA defines a range of MSRs to support specifying a delay value (Thrtl) per
+ COS, with details below.
+
+ ```
+ +----------------------------+----------------+
+ | MSR (per socket) | Address |
+ +----------------------------+----------------+
+ | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 |
+ +----------------------------+----------------+
+ | ... | ... |
+ +----------------------------+----------------+
+ | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n (n<64) |
+ +----------------------------+----------------+
+ ```
+
+ When context switch happens, the COS ID of VCPU is written to per-thread MSR
+ `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according
+ to the throttling value stored in the COS register.
+
+## The relationship between MBA and CAT/CDP
+
+ Generally speaking, MBA is completely independent of CAT/CDP, and any
+ combination may be applied at any time, e.g. enabling MBA with CAT
+ disabled.
+
+ But it needs to be noticed that MBA shares COS infrastructure with CAT,
+ although MBA is enumerated by different CPUID leaf from CAT (which
+ indicates that the max COS of MBA may be different from CAT). In some
+ cases, a domain is permitted to have a COS that is beyond one (or more)
+ of PSR features but within the others. For instance, let's assume the max
+ COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned
+ 9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA,
+ the HW works as default value is set since COS 9 is beyond the max COS (8)
+ of MBA.
+
+## Design Overview
+
+* Core COS/Thrtl association
+
+ When enforcing Memory Bandwidth Allocation, all cores of domains have
+ the same default COS (COS0) which stores the same Thrtl (0). The default
+ COS is used only in hypervisor and is transparent to tool stack and user.
+
+ System administrator can change PSR allocation policy at runtime by
+ tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
+ 2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enabled,
+ the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
+ neither CAT nor CDP is enabled, things would be easier, one COS
+ corresponds to one Thrtl.
+
+* VCPU schedule
+
+ This part reuses CAT COS infrastructure.
+
+* Multi-sockets
+
+ Different sockets may have different MBA ability (like max COS)
+ although it is consistent on the same socket. So the capability
+ of per-socket MBA is specified.
+
+ This part reuses CAT COS infrastructure.
+
+## Implementation Description
+
+* Hypervisor interfaces:
+
+ 1. Boot line param: "psr=mba" to enable the feature.
+
+ 2. SYSCTL:
+ - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information.
+
+ 3. DOMCTL:
+ - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain.
+ - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain.
+
+* xl interfaces:
+
+ 1. psr-mba-show [domain-id]
+ Show system/domain runtime MBA throttling value.
+ => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL
+
+ 2. psr-mba-set [OPTIONS] domain-id throttling
+ Set bandwidth throttling for a domain.
+ => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL
+
+ 3. psr-hwinfo
+ Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA.
+ => XEN_SYSCTL_PSR_MBA_get_info
+
+* Key data structure:
+
+ 1. Feature HW info
+
+ ```
+ struct {
+ unsigned int thrtl_max;
+ unsigned int linear;
+ } mba_info;
+
+ - Member `thrtl_max`
+
+ `thrtl_max` is the max throttling value to be set.
+
+ - Member `linear`
+
+ `linear` means the response of delay value is linear or not.
+
+ As mentioned above, MBA is a member of Intel PSR features, it would
+ share the base PSR infrastructure in Xen. For example, the 'cos_max'
+ is a common HW property for all features. So, for other data structure
+ details, please refer 'intel_psr_cat_cdp.pandoc'.
+
+# Limitations
+
+MBA can only work on HW which enables it (check by CPUID).
+
+# Testing
+
+We can execute these commands to verify MBA on different HWs supporting them.
+
+For example:
+ root@:~$ xl psr-hwinfo --mba
+ Memory Bandwidth Allocation (MBA):
+ Socket ID : 0
+ Linear Mode : Enabled
+ Maximum COS : 7
+ Maximum Throttling Value: 90
+ Default Throttling Value: 0
+
+ root@:~$ xl psr-mba-set 1 0xa
+
+ root@:~$ xl psr-mba-show 1
+ Socket ID : 0
+ Default THRTL : 0
+ ID NAME THRTL
+ 1 ubuntu14 0xa
+
+# Areas for improvement
+
+A hexadecimal number is used to show THRTL for a domain now. It may not be user-
+friendly.
+
+To improve this, the libxl interfaces can be wrapped in libvirt to provide more
+usr-friendly interfaces to user, e.g. a percentage number to show for linear
+mode.
+
+# Known issues
+
+N/A
+
+# References
+
+"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
+
+# History
+
+------------------------------------------------------------------------
+Date Revision Version Notes
+---------- -------- -------- -------------------------------------------
+2017-01-10 1.0 Xen 4.9 Design document written
+2017-07-10 1.1 Xen 4.10 Changes:
+ 1. Modify data structure according to latest
+ codes;
+ 2. Add content for 'Areas for improvement';
+ 3. Other minor changes.
+---------- -------- -------- -------------------------------------------
This patch creates MBA feature document in doc/features/. It describes key points to implement MBA which is described in details in Intel SDM "Introduction to Memory Bandwidth Allocation". Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> --- docs/features/intel_psr_mba.pandoc | 244 +++++++++++++++++++++++++++++++++++++ 1 file changed, 244 insertions(+) create mode 100644 docs/features/intel_psr_mba.pandoc