diff mbox

[v6,01/24] docs: create Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) feature document

Message ID 1486541776-8406-2-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun Feb. 8, 2017, 8:15 a.m. UTC
This patch creates CAT and CDP feature document in doc/features/. It describes
key points to implement L3 CAT/CDP and L2 CAT which is described in details in
Intel SDM "INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES".

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
v6:
    - write a new feature document to cover L3 CAT/CDP and L2 CAT.
    - adjust 'Terminology' position in document.
    - fix wordings.
    - add SDM chapter title in commit message.
    - add more explanations.
---
 docs/features/intel_psr_cat_cdp.pandoc | 453 +++++++++++++++++++++++++++++++++
 1 file changed, 453 insertions(+)
 create mode 100644 docs/features/intel_psr_cat_cdp.pandoc

Comments

Konrad Rzeszutek Wilk Feb. 8, 2017, 3:56 p.m. UTC | #1
On Wed, Feb 08, 2017 at 04:15:53PM +0800, Yi Sun wrote:
> This patch creates CAT and CDP feature document in doc/features/. It describes
> key points to implement L3 CAT/CDP and L2 CAT which is described in details in
> Intel SDM "INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES".
> 
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> ---
> v6:
>     - write a new feature document to cover L3 CAT/CDP and L2 CAT.
>     - adjust 'Terminology' position in document.
>     - fix wordings.
>     - add SDM chapter title in commit message.
>     - add more explanations.
> ---
>  docs/features/intel_psr_cat_cdp.pandoc | 453 +++++++++++++++++++++++++++++++++
>  1 file changed, 453 insertions(+)
>  create mode 100644 docs/features/intel_psr_cat_cdp.pandoc
> 
> diff --git a/docs/features/intel_psr_cat_cdp.pandoc b/docs/features/intel_psr_cat_cdp.pandoc
> new file mode 100644
> index 0000000..ebce2bd
> --- /dev/null
> +++ b/docs/features/intel_psr_cat_cdp.pandoc
> @@ -0,0 +1,453 @@
> +% Intel Cache Allocation Technology and Code and Data Prioritization Features
> +% Revision 1.0
> +
> +\clearpage
> +
> +# Basics
> +
> +---------------- ----------------------------------------------------
> +         Status: **Tech Preview**
> +
> +Architecture(s): Intel x86
> +
> +   Component(s): Hypervisor, toolstack
> +
> +       Hardware: L3 CAT: Haswell and beyond CPUs
> +                 CDP   : Broadwell and beyond CPUs
> +                 L2 CAT: Atom codename Goldmont and beyond CPUs
> +---------------- ----------------------------------------------------
> +
> +# Terminology
> +
> +* CAT         Cache Allocation Technology
> +* CBM         Capacity BitMasks
> +* CDP         Code and Data Prioritization
> +* COS/CLOS    Class of Service
> +* MSRs        Machine Specific Registers
> +* PSR         Intel Platform Shared Resource
> +
> +# Overview
> +
> +Intel provides a set of allocation capabilities including Cache Allocatation
> +Technology (CAT) and Code and Data Prioritization (CDP).
> +
> +CAT allows an OS or hypervisor to control allocation of a CPU's shared cache
> +based on application priority or Class of Service (COS). Each COS is configured
> +using capacity bitmasks (CBMs) which represent cache capacity and indicate the
> +degree of overlap and isolation between classes. Once CAT is configured, the pr-
> +ocessor allows access to portions of cache according to the established COS.
> +Intel Xeon processor E5 v4 family (and some others) introduce capabilities to
> +configure and make use of the CAT mechanism on the L3 cache. Intel Goldmont pro-
> +cessor provides support for control over the L2 cache.
> +
> +Code and Data Prioritization (CDP) Technology is an extension of CAT. CDP
> +enables isoloation and separate prioritization of code and data fetches to
           ^^^^^^^^^^
isolation

> +the L3 cahce in a SW configurable manner, which can enable workload priorit-
          ^^^^^
cache
> +ization and tuning of cache capacity to the characteristics of the workload.
> +CDP extends CAT by providing separate code and data masks per Class of Service
> +(COS). When SW configures to enable CDP, L3 CAT is disabled.
> +
> +# User details
> +
> +* Feature Enabling:
> +
> +  Add "psr=cat" to boot line parameter to enable all supported level CAT featu-
> +  res. Add "psr=cdp" to enable L3 CDP but disables L3 CAT by SW.
> +
> +* xl interfaces:
> +
> +  1. `psr-cat-show [OPTIONS] domain-id`:
> +
> +     Show L2 CAT or L3 CAT/CDP CBM of the domain designated by Xen domain-id.
> +
> +     Option `-l`:
> +     `-l2`: Show cbm for L2 cache.
> +     `-l3`: Show cbm for L3 cache.
> +
> +     If `-lX` is specified and LX is not supported, print error.
> +     If no `-l` is specified, level 3 is the default option.
> +
> +  2. `psr-cat-set [OPTIONS] domain-id cbm`:
> +
> +     Set L2 CAT or L3 CAT/CDP CBM to the domain designated by Xen domain-id.
> +
> +     Option `-s`: Specify the socket to process, otherwise all sockets are
> +     processed.
> +
> +     Option `-l`:
> +     `-l2`: Specify cbm for L2 cache.
> +     `-l3`: Specify cbm for L3 cache.
> +
> +     If `-lX` is specified and LX is not supported, print error.
> +     If no `-l` is specified, level 3 is the default option.
> +
> +     Option `-c` or `-d`:
> +     `-c`: Set L3 CDP code cbm.
> +     `-d`: Set L3 CDP data cbm.
> +
> +  3. `psr-hwinfo [OPTIONS]`:
> +
> +     Show CMT & L2 CAT & L3 CAT/CDP HW information on every socket.
> +
> +     Option `-m, --cmt`: Show Cache Monitoring Technology (CMT) hardware info.
> +
> +     Option `-a, --cat`: Show CAT/CDP hardware info.
> +
> +# Technical details
> +
> +L3 CAT/CDP and L2 CAT are all members of Intel PSR features, they share the base
> +PSR infrastructure in Xen.
> +
> +## Hardware perspective
> +
> +  CAT/CDP defines a range of MSRs to assign different cache access patterns
> +  which are known as CBMs, each CBM is associated with a COS.
> +
> +  ```
> +  E.g. L2 CAT:
> +                          +----------------------------+----------------+
> +     IA32_PQR_ASSOC       | MSR (per socket)           |    Address     |
> +   +----+---+-------+     +----------------------------+----------------+
> +   |    |COS|       |     | IA32_L2_QOS_MASK_0         |     0xD10      |
> +   +----+---+-------+     +----------------------------+----------------+
> +          └-------------> | ...                        |  ...           |
> +                          +----------------------------+----------------+
> +                          | IA32_L2_QOS_MASK_n         | 0xD10+n (n<64) |
> +                          +----------------------------+----------------+
> +  ```
> +
> +  L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128).
> +
> +  L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3
> +  CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache is
> +  supported.
> +
> +  Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to the
> +  hardware indicating the cache space an application should be limited to as

s/application/VM/ ?

> +  well as providing an indication of overlap and isolation in the CAT-capable
> +  cache from other applications contending for the cache.

s/application/VM/ ?

Perhaps 'domain' as you use that later in the document?


> +
> +  Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please
> +  note that all (and only) contiguous '1' combinations are allowed (e.g. FFFFH,
> +  0FF0H, 003CH, etc.).
> +
> +  ```
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Default Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 | A  | A  | A  | A  | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 | A  | A  | A  | A  | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Overlapped Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 |    |    |    |    | A  | A  | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 |    |    |    |    |    |    | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +
> +       +----+----+----+----+----+----+----+----+
> +       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
> +       +----+----+----+----+----+----+----+----+
> +  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
> +       +----+----+----+----+----+----+----+----+
> +  COS1 |    |    |    |    | A  | A  |    |    |
> +       +----+----+----+----+----+----+----+----+
> +  COS2 |    |    |    |    |    |    | A  | A  |
> +       +----+----+----+----+----+----+----+----+
> +  ```
> +
> +  We can get the CBM length through CPUID. The default value of CBM is calcul-
> +  ated by `(1ull << cbm_len) - 1`. That is a fully open bitmask, all ones bitm-
> +  ask. The COS[0] always stores the default value without change.
> +
> +  There is a `IA32_PQR_ASSOC` register which stores the COS ID of the VCPU. HW
> +  enforces cache allocation according to the corresponding CBM.
> +
> +## The relationship between L3 CAT/CDP and L2 CAT
> +
> +  HW may support all features. By default, CDP is disabled on the processor.
> +  If the L3 CAT MSRs are used without enabling CDP, the processor operates in
> +  a traditional CAT-only mode. When CDP is enabled,

s/,/:/
> +  * the CAT mask MSRs are re-mapped into interleaved pairs of mask MSRs for
> +    data or code fetches.
> +  * the range of COS for CAT is re-indexed, with the lower-half of the COS
> +    range available for CDP.
> +
> +  L2 CAT is independent of L3 CAT/CDP, which means L2 CAT can be enabled while
> +  L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are both enabled.
> +
> +  As a requirement, the bits of CBM of CAT/CDP must be continuous.
> +
> +  N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same associate
> +  register `IA32_PQR_ASSOC`, which means one COS is associated with a pair of
> +  L2 CAT CBM and L3 CAT/CDP CBM.
> +
> +  Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or other
> +  PSR features in future). In some cases, a VM is permitted to have a COS

I noticed you say 'domain' later on in the document. Would it make
sense to replace s/VM/domain/ to be same in this design?

> +  that is beyond one (or more) of PSR features but within the others. For
> +  instance, let's assume the max COS of L2 CAT is 8 but the max COS of L3
> +  CAT is 16, when a VM is assigned 9 as COS, the L3 CAT CBM associated to
> +  COS 9 would be enforced, but for L2 CAT, the HW works as default value is
> +  set since COS 9 is beyond the max COS (8) of L2 CAT.
> +
> +## Design Overview
> +
> +* Core COS/CBM association
> +
> +  When enforcing CAT/CDP, all cores of domains have the same default COS (COS0)
> +  which is associated with the fully open CBM (all ones bitmask) to access all
> +  cache. The default COS is used only in hypervisor and is transparent to tool
> +  stack and user.
> +
> +  System administrator can change PSR allocation policy at runtime by tool stack.
> +  Since L2 CAT shares COS with L3 CAT/CDP, a COS corresponds to a 2-tuple, like
> +  [L2 CBM, L3 CBM] with only-CAT enabled, when CDP is enabled, one COS correspo-
> +  nds to a 3-tuple, like [L2 CBM, L3 Code_CBM, L3 Data_CBM]. If neither L3 CAT
> +  nor L3 CDP is enabled, things would be easier, one COS corresponds to one L2
> +  CBM.
> +
> +* VCPU schedule
> +
> +  When context switch happens, the COS of VCPU is written to per-thread MSR
> +  `IA32_PQR_ASSOC`, and then hardware enforces cache allocation according to
> +  the corresponding CBM.
> +
> +* Multi-sockets
> +
> +  Different sockets may have different CAT/CDP capability (e.g. max COS) alth-
> +  ough it is consistent on the same socket. So the capability of per-socket CAT/
> +  CDP is specified.
> +
> +  'psr-cat-set' can set CBM for one domain per socket. On each socket, we main-
> +  tain a COS array for all domains. One domain uses one COS at one time. One COS
> +  stores the CBM of the domain to work. So, when a VCPU of the domain is migrat-
> +  ed from socket 1 to socket 2, it follows configuration on socket 2.
> +
> +  E.g. user sets domain 1 CBM on socket 1 to 0x7f which uses COS 9 but sets do-
> +  main 1 CBM on socket 2 to 0x3f which uses COS 7. When VCPU of this domain
> +  is migrated from socket 1 to 2, the COS ID used is 7, that means 0x3f is the
> +  CBM to work for this domain 1 now.
> +
> +## Implementation Description
> +
> +* Hypervisor interfaces:
> +
> +  1. Boot line parameter "psr=cat" enables L2 CAT and L3 CAT if hardware suppo-
> +     rted. "psr=cdp" enables CDP if hardware supported.
> +
> +  2. SYSCTL:
> +          - XEN_SYSCTL_PSR_CAT_get_l3_info: Get L3 CAT/CDP information.
> +          - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information.
> +
> +  3. DOMCTL:
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: Get L3 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: Set L3 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE: Get CDP Code CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE: Set CDP Code CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA: Get CDP Data CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA: Set CDP Data CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain.
> +          - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain.
> +
> +* xl interfaces:
> +
> +  1. psr-cat-show -lX domain-id
> +          Show LX cbm for a domain.
> +          => XEN_SYSCTL_PSR_CAT_get_l3_info    /
> +             XEN_SYSCTL_PSR_CAT_get_l2_info    /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM  /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA /
> +             XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM
> +
> +  2. psr-cat-set -lX domain-id cbm
> +          Set LX cbm for a domain.
> +          => XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM  /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA /
> +             XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM
> +
> +  3. psr-hwinfo
> +          Show PSR HW information, including L3 CAT/CDP/L2 CAT
> +          => XEN_SYSCTL_PSR_CAT_get_l3_info /
> +             XEN_SYSCTL_PSR_CAT_get_l2_info
> +
> +* Key data structure:
> +
> +   1. Feature HW info
> +
> +      ```
> +      struct psr_cat_hw_info {
> +          unsigned int cbm_len;
> +          unsigned int cos_max;
> +      };
> +      ```
> +
> +      - Member `cbm_len`
> +
> +        `cbm_len` is one of the hardware info of CAT. It means the max number
> +        of bits to set.
> +
> +      - Member `cos_max`
> +
> +        `cos_max` is one of the hardware info of CAT. It means the max number
> +        of COS registers.
> +
> +   2. Feature list node
> +
> +      ```
> +      struct feat_node {
> +          enum psr_feat_type feature;
> +          struct feat_ops ops;
> +          struct psr_cat_hw_info info;
> +          uint64_t cos_reg_val[MAX_COS_REG_NUM];
> +          struct list_head list;
> +      };
> +      ```
> +
> +      When a PSR enforcement feature is enabled, it will be added into a
> +      feature list. The head of the list is created in psr initialization.
> +
> +      - Member `feature`
> +
> +        `feature` is an integer number, to indicate which feature the list entry
> +        corresponds to.
> +
> +      - Member `ops`
> +
> +        `ops` maintains a callback function list of the feature. It will be introduced
> +        in details later at `4. Feature operation functions structure`.

I think you can just do:

[Feature operation functions structure]

And when you run `pandoc -toc -o intel_psr_cat_cdp.pdf
intel_psr_cat_cdp.pandoc`

it will provide the right link (which you can follow) to the proper
section.
> +
> +      - Member `info`
> +
> +        `info` maintains the feature HW information which are provided to psr_hwinfo
> +        command.
> +
> +      - Member `cos_reg_val`
> +
> +        `cos_reg_val` is an array to maintain the value set in all COS registers of
> +        the feature. The array is indexed by COS ID.
> +
> +   3. Per-socket PSR features information structure
> +
> +      ```
> +      struct psr_socket_info {
> +          unsigned int feat_mask;
> +          unsigned int nr_feat;
> +          struct list_head feat_list;
> +          unsigned int cos_ref[MAX_COS_REG_NUM];
> +          spinlock_t ref_lock;
> +      };
> +      ```
> +
> +      We collect all PSR allocation features information of a socket in this
> +      `struct psr_socket_info`.
> +
> +      - Member `feat_mask`
> +
> +        `feat_mask` is a bitmap, to indicate which feature is enabled on current
> +        socket. We define `feat_mask` bitmap as:
> +
> +        bit 0: L3 CAT status.
> +        bit 1: L3 CDP status.
> +        bit 2: L2 CAT status.

Just in case if you change the code and there are more bit positions - I
would recommend you replace the 'We define 'feat_mask' bitmap as:
.. bit 0 .."

with:

"See values defined in 'enum psr_feat_type'"

As that will make it easier in case the code is changed but the doc
becomes out-dated.
> +
> +      - Member `nr_feat`
> +
> +        `nr_feat` means the number of PSR features enabled.
> +
> +      - Member `cos_ref`
> +
> +        `cos_ref` is an array which maintains the reference of one COS. It maps
> +        to cos_reg_val[MAX_COS_REG_NUM] in `struct feat_node`. If one COS is
> +        used by one domain, the corresponding reference will increase by one. If
> +        a domain releases the COS, the reference will decrease by one. The array
> +        is indexed by COS ID.
> +
> +   4. Feature operation functions structure
> +
> +      ```
> +      struct feat_ops {
> +          unsigned int (*get_cos_max)(const struct feat_node *feat);
> +          int (*get_feat_info)(const struct feat_node *feat,
> +                               uint32_t data[], uint32_t array_len);
> +          int (*get_val)(const struct feat_node *feat, unsigned int cos,
> +                         enum cbm_type type, uint64_t *val);
> +          unsigned int (*get_cos_num)(const struct feat_node *feat);
> +          int (*get_old_val)(uint64_t val[],
> +                             const struct feat_node *feat,
> +                             unsigned int old_cos);
> +          int (*set_new_val)(uint64_t val[],
> +                             const struct feat_node *feat,
> +                             unsigned int old_cos,
> +                             enum cbm_type type,
> +                             uint64_t m);
> +          int (*compare_val)(const uint64_t val[], const struct feat_node *feat,
> +                             unsigned int cos, bool *found);
> +          unsigned int (*fits_cos_max)(const uint64_t val[],
> +                                       const struct feat_node *feat,
> +                                       unsigned int cos);
> +          int (*write_msr)(unsigned int cos, const uint64_t val[],
> +                           struct feat_node *feat);
> +      };
> +      ```
> +
> +      We abstract above callback functions to encapsulate the feature specific
> +      behaviors into them. Then, it is easy to add a new feature. We just need:
> +          1) Implement such ops and callback functions for every feature.
> +          2) Register the ops into `struct feat_node`.
> +          3) Add the feature into feature list during CPU initialization.
> +
> +# Limitations
> +
> +CAT/CDP can only work on HW which enables it(check by CPUID). So far, there is
> +no HW which enables both L2 CAT and L3 CAT/CDP. But SW implementation has cons-
> +idered such scenario to enable both L2 CAT and L3 CAT/CDP.
> +
> +# Testing
> +
> +We can execute above xl commands to verify L2 CAT and L3 CAT/CDP on different
> +HWs support them.
> +
> +For example:
> +    root@:~$ xl psr-hwinfo --cat
> +    Cache Allocation Technology (CAT): L2
> +    Socket ID       : 0
> +    Maximum COS     : 3
> +    CBM length      : 8
> +    Default CBM     : 0xff
> +
> +    root@:~$ xl psr-cat-cbm-set -l2 1 0x7f
> +
> +    root@:~$ xl psr-cat-show -l2 1
> +    Socket ID       : 0
> +    Default CBM     : 0xff
> +       ID                     NAME             CBM
> +        1                 ubuntu14            0x7f
> +
> +# Areas for improvement
> +
> +N/A

I would say that using '0x7f' is not very user-friendly. It really
would be good if that changed to something easier to grok.

For example if I am system admin and I look at:

       +----+----+----+----+----+----+----+----+
       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
       +----+----+----+----+----+----+----+----+
  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
       +----+----+----+----+----+----+----+----+
  COS1 |    |    |    |    | A  | A  |    |    |
       +----+----+----+----+----+----+----+----+
  COS2 |    |    |    |    |    |    | A  | A  |
       +----+----+----+----+----+----+----+----+

I would think that giving an guest 'M7->M4' means it has more
cache than M3->M2 or M1->M0.

But that is not spelled in details. Or what happens if I do:
       +----+----+----+----+----+----+----+----+
       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
       +----+----+----+----+----+----+----+----+
  COS0 |    |    |    |    |    | A  |    |    | Isolated Bitmask
       +----+----+----+----+----+----+----+----+
  COS1 |    |    |    |    |    |    | A  |    |
       +----+----+----+----+----+----+----+----+
  COS2 |    |    |    |    |    |    |    | A  |
       +----+----+----+----+----+----+----+----+

Does that have the same effect as the previous one?
I would think not, but perhaps it is the same (we set
three 'pools').

And does this mean that I've made a grave error
and M7->M3 are in effect only available to the hypervisor (and
dom0?, but only if dom0 is PV, but not for PVH dom0?)

Thanks!
> +
> +# Known issues
> +
> +N/A
> +
> +# References
> +
> +"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
> +
> +# History
> +
> +------------------------------------------------------------------------
> +Date       Revision Version  Notes
> +---------- -------- -------- -------------------------------------------
> +2016-08-12 1.0      Xen 4.9  Design document written

Perhaps update that a bit? I think we are at 1.6 ?
> +---------- -------- -------- -------------------------------------------
> -- 
> 1.9.1
>
Yi Sun Feb. 9, 2017, 6:38 a.m. UTC | #2
On 17-02-08 10:56:55, Konrad Rzeszutek Wilk wrote:
> On Wed, Feb 08, 2017 at 04:15:53PM +0800, Yi Sun wrote:
> > +  Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to the
> > +  hardware indicating the cache space an application should be limited to as
> 
> s/application/VM/ ?
> 
> > +  well as providing an indication of overlap and isolation in the CAT-capable
> > +  cache from other applications contending for the cache.
> 
> s/application/VM/ ?
> 
> Perhaps 'domain' as you use that later in the document?
> 
> > +  PSR features in future). In some cases, a VM is permitted to have a COS
> 
> I noticed you say 'domain' later on in the document. Would it make
> sense to replace s/VM/domain/ to be same in this design?
> 
I think domain should be better.

> > +      - Member `ops`
> > +
> > +        `ops` maintains a callback function list of the feature. It will be introduced
> > +        in details later at `4. Feature operation functions structure`.
> 
> I think you can just do:
> 
> [Feature operation functions structure]
> 
> And when you run `pandoc -toc -o intel_psr_cat_cdp.pdf
> intel_psr_cat_cdp.pandoc`
> 
> it will provide the right link (which you can follow) to the proper
> section.

I tried this command but encountered below error.

$ pandoc -toc -o intel_psr_cat_cdp.pdf docs/features/intel_psr_cat_cdp.pandoc
pandoc: cannot produce pdf output with oc writer

> > +    root@:~$ xl psr-cat-cbm-set -l2 1 0x7f
> > +
> > +    root@:~$ xl psr-cat-show -l2 1
> > +    Socket ID       : 0
> > +    Default CBM     : 0xff
> > +       ID                     NAME             CBM
> > +        1                 ubuntu14            0x7f
> > +
> > +# Areas for improvement
> > +
> > +N/A
> 
> I would say that using '0x7f' is not very user-friendly. It really
> would be good if that changed to something easier to grok.
> 
I agree that '0x7f' is not user-friendly. This needs the user know some HW
details.

> For example if I am system admin and I look at:
> 
>        +----+----+----+----+----+----+----+----+
>        | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
>        +----+----+----+----+----+----+----+----+
>   COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
>        +----+----+----+----+----+----+----+----+
>   COS1 |    |    |    |    | A  | A  |    |    |
>        +----+----+----+----+----+----+----+----+
>   COS2 |    |    |    |    |    |    | A  | A  |
>        +----+----+----+----+----+----+----+----+
> 
> I would think that giving an guest 'M7->M4' means it has more
> cache than M3->M2 or M1->M0.
> 
> But that is not spelled in details. Or what happens if I do:
>        +----+----+----+----+----+----+----+----+
>        | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
>        +----+----+----+----+----+----+----+----+
>   COS0 |    |    |    |    |    | A  |    |    | Isolated Bitmask
>        +----+----+----+----+----+----+----+----+
>   COS1 |    |    |    |    |    |    | A  |    |
>        +----+----+----+----+----+----+----+----+
>   COS2 |    |    |    |    |    |    |    | A  |
>        +----+----+----+----+----+----+----+----+
> 
> Does that have the same effect as the previous one?
> I would think not, but perhaps it is the same (we set
> three 'pools').
> 
No, this has different effect. One bit means a quantity of cache capacity, e.g.
1 bit equals 1M. For first case, 0 uses 4M cache, 1 uses 2M, 2 uses 2M. For
second case, all three use 1M respectively. 

Although '0x7F' is not user friendly, it gives the ability to control the
granularity of cache allocation. Furthermore, the implementation in Xen is low
level interface and we should provide the most flexibility to users. I think
the upper layer, e.g. libvirt, may do further wrapper to provide more user
friendly interfaces.

Thanks,
Sun Yi
diff mbox

Patch

diff --git a/docs/features/intel_psr_cat_cdp.pandoc b/docs/features/intel_psr_cat_cdp.pandoc
new file mode 100644
index 0000000..ebce2bd
--- /dev/null
+++ b/docs/features/intel_psr_cat_cdp.pandoc
@@ -0,0 +1,453 @@ 
+% Intel Cache Allocation Technology and Code and Data Prioritization Features
+% Revision 1.0
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Tech Preview**
+
+Architecture(s): Intel x86
+
+   Component(s): Hypervisor, toolstack
+
+       Hardware: L3 CAT: Haswell and beyond CPUs
+                 CDP   : Broadwell and beyond CPUs
+                 L2 CAT: Atom codename Goldmont and beyond CPUs
+---------------- ----------------------------------------------------
+
+# Terminology
+
+* CAT         Cache Allocation Technology
+* CBM         Capacity BitMasks
+* CDP         Code and Data Prioritization
+* COS/CLOS    Class of Service
+* MSRs        Machine Specific Registers
+* PSR         Intel Platform Shared Resource
+
+# Overview
+
+Intel provides a set of allocation capabilities including Cache Allocatation
+Technology (CAT) and Code and Data Prioritization (CDP).
+
+CAT allows an OS or hypervisor to control allocation of a CPU's shared cache
+based on application priority or Class of Service (COS). Each COS is configured
+using capacity bitmasks (CBMs) which represent cache capacity and indicate the
+degree of overlap and isolation between classes. Once CAT is configured, the pr-
+ocessor allows access to portions of cache according to the established COS.
+Intel Xeon processor E5 v4 family (and some others) introduce capabilities to
+configure and make use of the CAT mechanism on the L3 cache. Intel Goldmont pro-
+cessor provides support for control over the L2 cache.
+
+Code and Data Prioritization (CDP) Technology is an extension of CAT. CDP
+enables isoloation and separate prioritization of code and data fetches to
+the L3 cahce in a SW configurable manner, which can enable workload priorit-
+ization and tuning of cache capacity to the characteristics of the workload.
+CDP extends CAT by providing separate code and data masks per Class of Service
+(COS). When SW configures to enable CDP, L3 CAT is disabled.
+
+# User details
+
+* Feature Enabling:
+
+  Add "psr=cat" to boot line parameter to enable all supported level CAT featu-
+  res. Add "psr=cdp" to enable L3 CDP but disables L3 CAT by SW.
+
+* xl interfaces:
+
+  1. `psr-cat-show [OPTIONS] domain-id`:
+
+     Show L2 CAT or L3 CAT/CDP CBM of the domain designated by Xen domain-id.
+
+     Option `-l`:
+     `-l2`: Show cbm for L2 cache.
+     `-l3`: Show cbm for L3 cache.
+
+     If `-lX` is specified and LX is not supported, print error.
+     If no `-l` is specified, level 3 is the default option.
+
+  2. `psr-cat-set [OPTIONS] domain-id cbm`:
+
+     Set L2 CAT or L3 CAT/CDP CBM to the domain designated by Xen domain-id.
+
+     Option `-s`: Specify the socket to process, otherwise all sockets are
+     processed.
+
+     Option `-l`:
+     `-l2`: Specify cbm for L2 cache.
+     `-l3`: Specify cbm for L3 cache.
+
+     If `-lX` is specified and LX is not supported, print error.
+     If no `-l` is specified, level 3 is the default option.
+
+     Option `-c` or `-d`:
+     `-c`: Set L3 CDP code cbm.
+     `-d`: Set L3 CDP data cbm.
+
+  3. `psr-hwinfo [OPTIONS]`:
+
+     Show CMT & L2 CAT & L3 CAT/CDP HW information on every socket.
+
+     Option `-m, --cmt`: Show Cache Monitoring Technology (CMT) hardware info.
+
+     Option `-a, --cat`: Show CAT/CDP hardware info.
+
+# Technical details
+
+L3 CAT/CDP and L2 CAT are all members of Intel PSR features, they share the base
+PSR infrastructure in Xen.
+
+## Hardware perspective
+
+  CAT/CDP defines a range of MSRs to assign different cache access patterns
+  which are known as CBMs, each CBM is associated with a COS.
+
+  ```
+  E.g. L2 CAT:
+                          +----------------------------+----------------+
+     IA32_PQR_ASSOC       | MSR (per socket)           |    Address     |
+   +----+---+-------+     +----------------------------+----------------+
+   |    |COS|       |     | IA32_L2_QOS_MASK_0         |     0xD10      |
+   +----+---+-------+     +----------------------------+----------------+
+          └-------------> | ...                        |  ...           |
+                          +----------------------------+----------------+
+                          | IA32_L2_QOS_MASK_n         | 0xD10+n (n<64) |
+                          +----------------------------+----------------+
+  ```
+
+  L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128).
+
+  L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3
+  CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache is
+  supported.
+
+  Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to the
+  hardware indicating the cache space an application should be limited to as
+  well as providing an indication of overlap and isolation in the CAT-capable
+  cache from other applications contending for the cache.
+
+  Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please
+  note that all (and only) contiguous '1' combinations are allowed (e.g. FFFFH,
+  0FF0H, 003CH, etc.).
+
+  ```
+       +----+----+----+----+----+----+----+----+
+       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
+       +----+----+----+----+----+----+----+----+
+  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Default Bitmask
+       +----+----+----+----+----+----+----+----+
+  COS1 | A  | A  | A  | A  | A  | A  | A  | A  |
+       +----+----+----+----+----+----+----+----+
+  COS2 | A  | A  | A  | A  | A  | A  | A  | A  |
+       +----+----+----+----+----+----+----+----+
+
+       +----+----+----+----+----+----+----+----+
+       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
+       +----+----+----+----+----+----+----+----+
+  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Overlapped Bitmask
+       +----+----+----+----+----+----+----+----+
+  COS1 |    |    |    |    | A  | A  | A  | A  |
+       +----+----+----+----+----+----+----+----+
+  COS2 |    |    |    |    |    |    | A  | A  |
+       +----+----+----+----+----+----+----+----+
+
+       +----+----+----+----+----+----+----+----+
+       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
+       +----+----+----+----+----+----+----+----+
+  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
+       +----+----+----+----+----+----+----+----+
+  COS1 |    |    |    |    | A  | A  |    |    |
+       +----+----+----+----+----+----+----+----+
+  COS2 |    |    |    |    |    |    | A  | A  |
+       +----+----+----+----+----+----+----+----+
+  ```
+
+  We can get the CBM length through CPUID. The default value of CBM is calcul-
+  ated by `(1ull << cbm_len) - 1`. That is a fully open bitmask, all ones bitm-
+  ask. The COS[0] always stores the default value without change.
+
+  There is a `IA32_PQR_ASSOC` register which stores the COS ID of the VCPU. HW
+  enforces cache allocation according to the corresponding CBM.
+
+## The relationship between L3 CAT/CDP and L2 CAT
+
+  HW may support all features. By default, CDP is disabled on the processor.
+  If the L3 CAT MSRs are used without enabling CDP, the processor operates in
+  a traditional CAT-only mode. When CDP is enabled,
+  * the CAT mask MSRs are re-mapped into interleaved pairs of mask MSRs for
+    data or code fetches.
+  * the range of COS for CAT is re-indexed, with the lower-half of the COS
+    range available for CDP.
+
+  L2 CAT is independent of L3 CAT/CDP, which means L2 CAT can be enabled while
+  L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are both enabled.
+
+  As a requirement, the bits of CBM of CAT/CDP must be continuous.
+
+  N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same associate
+  register `IA32_PQR_ASSOC`, which means one COS is associated with a pair of
+  L2 CAT CBM and L3 CAT/CDP CBM.
+
+  Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or other
+  PSR features in future). In some cases, a VM is permitted to have a COS
+  that is beyond one (or more) of PSR features but within the others. For
+  instance, let's assume the max COS of L2 CAT is 8 but the max COS of L3
+  CAT is 16, when a VM is assigned 9 as COS, the L3 CAT CBM associated to
+  COS 9 would be enforced, but for L2 CAT, the HW works as default value is
+  set since COS 9 is beyond the max COS (8) of L2 CAT.
+
+## Design Overview
+
+* Core COS/CBM association
+
+  When enforcing CAT/CDP, all cores of domains have the same default COS (COS0)
+  which is associated with the fully open CBM (all ones bitmask) to access all
+  cache. The default COS is used only in hypervisor and is transparent to tool
+  stack and user.
+
+  System administrator can change PSR allocation policy at runtime by tool stack.
+  Since L2 CAT shares COS with L3 CAT/CDP, a COS corresponds to a 2-tuple, like
+  [L2 CBM, L3 CBM] with only-CAT enabled, when CDP is enabled, one COS correspo-
+  nds to a 3-tuple, like [L2 CBM, L3 Code_CBM, L3 Data_CBM]. If neither L3 CAT
+  nor L3 CDP is enabled, things would be easier, one COS corresponds to one L2
+  CBM.
+
+* VCPU schedule
+
+  When context switch happens, the COS of VCPU is written to per-thread MSR
+  `IA32_PQR_ASSOC`, and then hardware enforces cache allocation according to
+  the corresponding CBM.
+
+* Multi-sockets
+
+  Different sockets may have different CAT/CDP capability (e.g. max COS) alth-
+  ough it is consistent on the same socket. So the capability of per-socket CAT/
+  CDP is specified.
+
+  'psr-cat-set' can set CBM for one domain per socket. On each socket, we main-
+  tain a COS array for all domains. One domain uses one COS at one time. One COS
+  stores the CBM of the domain to work. So, when a VCPU of the domain is migrat-
+  ed from socket 1 to socket 2, it follows configuration on socket 2.
+
+  E.g. user sets domain 1 CBM on socket 1 to 0x7f which uses COS 9 but sets do-
+  main 1 CBM on socket 2 to 0x3f which uses COS 7. When VCPU of this domain
+  is migrated from socket 1 to 2, the COS ID used is 7, that means 0x3f is the
+  CBM to work for this domain 1 now.
+
+## Implementation Description
+
+* Hypervisor interfaces:
+
+  1. Boot line parameter "psr=cat" enables L2 CAT and L3 CAT if hardware suppo-
+     rted. "psr=cdp" enables CDP if hardware supported.
+
+  2. SYSCTL:
+          - XEN_SYSCTL_PSR_CAT_get_l3_info: Get L3 CAT/CDP information.
+          - XEN_SYSCTL_PSR_CAT_get_l2_info: Get L2 CAT information.
+
+  3. DOMCTL:
+          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: Get L3 CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: Set L3 CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE: Get CDP Code CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE: Set CDP Code CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA: Get CDP Data CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA: Set CDP Data CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM: Get L2 CBM for a domain.
+          - XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM: Set L2 CBM for a domain.
+
+* xl interfaces:
+
+  1. psr-cat-show -lX domain-id
+          Show LX cbm for a domain.
+          => XEN_SYSCTL_PSR_CAT_get_l3_info    /
+             XEN_SYSCTL_PSR_CAT_get_l2_info    /
+             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM  /
+             XEN_DOMCTL_PSR_CAT_OP_GET_L3_CODE /
+             XEN_DOMCTL_PSR_CAT_OP_GET_L3_DATA /
+             XEN_DOMCTL_PSR_CAT_OP_GET_L2_CBM
+
+  2. psr-cat-set -lX domain-id cbm
+          Set LX cbm for a domain.
+          => XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM  /
+             XEN_DOMCTL_PSR_CAT_OP_SET_L3_CODE /
+             XEN_DOMCTL_PSR_CAT_OP_SET_L3_DATA /
+             XEN_DOMCTL_PSR_CAT_OP_SET_L2_CBM
+
+  3. psr-hwinfo
+          Show PSR HW information, including L3 CAT/CDP/L2 CAT
+          => XEN_SYSCTL_PSR_CAT_get_l3_info /
+             XEN_SYSCTL_PSR_CAT_get_l2_info
+
+* Key data structure:
+
+   1. Feature HW info
+
+      ```
+      struct psr_cat_hw_info {
+          unsigned int cbm_len;
+          unsigned int cos_max;
+      };
+      ```
+
+      - Member `cbm_len`
+
+        `cbm_len` is one of the hardware info of CAT. It means the max number
+        of bits to set.
+
+      - Member `cos_max`
+
+        `cos_max` is one of the hardware info of CAT. It means the max number
+        of COS registers.
+
+   2. Feature list node
+
+      ```
+      struct feat_node {
+          enum psr_feat_type feature;
+          struct feat_ops ops;
+          struct psr_cat_hw_info info;
+          uint64_t cos_reg_val[MAX_COS_REG_NUM];
+          struct list_head list;
+      };
+      ```
+
+      When a PSR enforcement feature is enabled, it will be added into a
+      feature list. The head of the list is created in psr initialization.
+
+      - Member `feature`
+
+        `feature` is an integer number, to indicate which feature the list entry
+        corresponds to.
+
+      - Member `ops`
+
+        `ops` maintains a callback function list of the feature. It will be introduced
+        in details later at `4. Feature operation functions structure`.
+
+      - Member `info`
+
+        `info` maintains the feature HW information which are provided to psr_hwinfo
+        command.
+
+      - Member `cos_reg_val`
+
+        `cos_reg_val` is an array to maintain the value set in all COS registers of
+        the feature. The array is indexed by COS ID.
+
+   3. Per-socket PSR features information structure
+
+      ```
+      struct psr_socket_info {
+          unsigned int feat_mask;
+          unsigned int nr_feat;
+          struct list_head feat_list;
+          unsigned int cos_ref[MAX_COS_REG_NUM];
+          spinlock_t ref_lock;
+      };
+      ```
+
+      We collect all PSR allocation features information of a socket in this
+      `struct psr_socket_info`.
+
+      - Member `feat_mask`
+
+        `feat_mask` is a bitmap, to indicate which feature is enabled on current
+        socket. We define `feat_mask` bitmap as:
+
+        bit 0: L3 CAT status.
+        bit 1: L3 CDP status.
+        bit 2: L2 CAT status.
+
+      - Member `nr_feat`
+
+        `nr_feat` means the number of PSR features enabled.
+
+      - Member `cos_ref`
+
+        `cos_ref` is an array which maintains the reference of one COS. It maps
+        to cos_reg_val[MAX_COS_REG_NUM] in `struct feat_node`. If one COS is
+        used by one domain, the corresponding reference will increase by one. If
+        a domain releases the COS, the reference will decrease by one. The array
+        is indexed by COS ID.
+
+   4. Feature operation functions structure
+
+      ```
+      struct feat_ops {
+          unsigned int (*get_cos_max)(const struct feat_node *feat);
+          int (*get_feat_info)(const struct feat_node *feat,
+                               uint32_t data[], uint32_t array_len);
+          int (*get_val)(const struct feat_node *feat, unsigned int cos,
+                         enum cbm_type type, uint64_t *val);
+          unsigned int (*get_cos_num)(const struct feat_node *feat);
+          int (*get_old_val)(uint64_t val[],
+                             const struct feat_node *feat,
+                             unsigned int old_cos);
+          int (*set_new_val)(uint64_t val[],
+                             const struct feat_node *feat,
+                             unsigned int old_cos,
+                             enum cbm_type type,
+                             uint64_t m);
+          int (*compare_val)(const uint64_t val[], const struct feat_node *feat,
+                             unsigned int cos, bool *found);
+          unsigned int (*fits_cos_max)(const uint64_t val[],
+                                       const struct feat_node *feat,
+                                       unsigned int cos);
+          int (*write_msr)(unsigned int cos, const uint64_t val[],
+                           struct feat_node *feat);
+      };
+      ```
+
+      We abstract above callback functions to encapsulate the feature specific
+      behaviors into them. Then, it is easy to add a new feature. We just need:
+          1) Implement such ops and callback functions for every feature.
+          2) Register the ops into `struct feat_node`.
+          3) Add the feature into feature list during CPU initialization.
+
+# Limitations
+
+CAT/CDP can only work on HW which enables it(check by CPUID). So far, there is
+no HW which enables both L2 CAT and L3 CAT/CDP. But SW implementation has cons-
+idered such scenario to enable both L2 CAT and L3 CAT/CDP.
+
+# Testing
+
+We can execute above xl commands to verify L2 CAT and L3 CAT/CDP on different
+HWs support them.
+
+For example:
+    root@:~$ xl psr-hwinfo --cat
+    Cache Allocation Technology (CAT): L2
+    Socket ID       : 0
+    Maximum COS     : 3
+    CBM length      : 8
+    Default CBM     : 0xff
+
+    root@:~$ xl psr-cat-cbm-set -l2 1 0x7f
+
+    root@:~$ xl psr-cat-show -l2 1
+    Socket ID       : 0
+    Default CBM     : 0xff
+       ID                     NAME             CBM
+        1                 ubuntu14            0x7f
+
+# Areas for improvement
+
+N/A
+
+# Known issues
+
+N/A
+
+# References
+
+"INTEL® RESOURCE DIRECTOR TECHNOLOGY (INTEL® RDT) ALLOCATION FEATURES" [Intel® 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2016-08-12 1.0      Xen 4.9  Design document written
+---------- -------- -------- -------------------------------------------