[RFC,0/3] SCMI Vhost and Virtio backend implementation

Message ID	20220609071956.5183-1-quic_neeraju@quicinc.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: Neeraj Upadhyay <quic_neeraju@quicinc.com> To: <mst@redhat.com>, <jasowang@redhat.com>, <sudeep.holla@arm.com>, <cristian.marussi@arm.com> CC: <quic_sramana@quicinc.com>, <vincent.guittot@linaro.org>, <linux-arm-kernel@lists.infradead.org>, <kvm@vger.kernel.org>, <virtualization@lists.linux-foundation.org>, <linux-kernel@vger.kernel.org>, Neeraj Upadhyay <quic_neeraju@quicinc.com> Subject: [RFC 0/3] SCMI Vhost and Virtio backend implementation Date: Thu, 9 Jun 2022 12:49:53 +0530 Message-ID: <20220609071956.5183-1-quic_neeraju@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	SCMI Vhost and Virtio backend implementation \| expand [RFC,0/3] SCMI Vhost and Virtio backend implementation [RFC,1/3] dt-bindings: arm: Add document for SCMI Virtio backend device [RFC,2/3] firmware: Add ARM SCMI Virtio backend implementation [RFC,3/3] vhost/scmi: Add Host kernel accelerator for Virtio SCMI

Message ID

20220609071956.5183-1-quic_neeraju@quicinc.com (mailing list archive)

Headers

From: Neeraj Upadhyay <quic_neeraju@quicinc.com>
To: <mst@redhat.com>, <jasowang@redhat.com>, <sudeep.holla@arm.com>,
        <cristian.marussi@arm.com>
CC: <quic_sramana@quicinc.com>, <vincent.guittot@linaro.org>,
        <linux-arm-kernel@lists.infradead.org>, <kvm@vger.kernel.org>,
        <virtualization@lists.linux-foundation.org>,
        <linux-kernel@vger.kernel.org>,
        Neeraj Upadhyay <quic_neeraju@quicinc.com>
Subject: [RFC 0/3] SCMI Vhost and Virtio backend implementation
Date: Thu, 9 Jun 2022 12:49:53 +0530
Message-ID: <20220609071956.5183-1-quic_neeraju@quicinc.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

SCMI Vhost and Virtio backend implementation | expand

Message

Neeraj Upadhyay June 9, 2022, 7:19 a.m. UTC

This RFC series, provides ARM System Control and Management Interface (SCMI)
protocol backend implementation for Virtio transport. The purpose of this
feature is to provide para-virtualized interfaces to guest VMs, to various
hardware blocks like clocks, regulators. This allows the guest VMs to
communicate their resource needs to the host, in the absence of direct
access to those resources.

1. Architecture overview
---------------------

Below diagram shows the overall software architecture of SCMI communication
between guest VM and the host software. In this diagram, guest is a linux
VM; also, host uses KVM linux.

         GUEST VM                   HOST
 +--------------------+    +---------------------+    +--------------+
 |   a. Device A      |    |   k. Device B       |    |      PLL     |
 |  (Clock consumer)  |    |  (Clock consumer)   |    |              |
 +--------------------+    +---------------------+    +--------------+
          |                         |                         ^
          v                         v                         |
 +--------------------+    +---------------------+    +-----------------+
 | b. Clock Framework |    | j. Clock Framework  | -->| l. Clock Driver |
 +-- -----------------+    +---------------------+    +-----------------+
          |                         ^
          v                         |
 +--------------------+    +------------------------+
 |  c. SCMI Clock     |    | i. SCMI Virtio Backend |
 +--------------------+    +------------------------+ 
          |                         ^
          v                         |
 +--------------------+    +----------------------+
 |  d. SCMI Virtio    |    |   h. SCMI Vhost      |<-----------+
 +--------------------+    +----------------------+            |
          |                         ^                          |
          v                         |                          |
+-------------------------------------------------+    +-----------------+
|              e. Virtio Infra                    |    |    g. VMM       |
+-------------------------------------------------+    +-----------------+
          |                         ^                           ^
          v                         |                           |
+-------------------------------------------------+             |
|                f. Hypervisor                    |-------------
+-------------------------------------------------+

a. Device A             This is the client kernel driver in guest VM,
                        for ex. diplay driver, which uses standard
                        clock framework APIs to vote for a clock.

b. Clock Framework      Underlying kernel clock framework on
                        guest.

c. SCMI Clock           SCMI interface based clock driver.

d. SCMI Virtio          Underlying SCMI framework, using Virtio as
                        transport driver.

e. Virtio Infra         Virtio drivers on guest VM. These drivers
                        initiate virtqueue requests over Virtio
                        transport (MMIO/PCI), and forwards response
                        to SCMI Virtio registered callbacks.

f. Hypervisor           Hosted Hypervisor (KVM for ex.), which traps
                        and forwards requests on virtqueue ring
                        buffers to the VMM.

g. VMM                  Virtual Machine Monitor, running on host userspace,
                        which manages the lifecycle of guest VMs, and forwards
                        guest initiated virtqueue requests as IOCTLs to the
                        Vhost driver on host.

h. SCMI Vhost           In kernel driver, which handles SCMI virtqueue
                        requests from guest VMs. This driver forwards the
                        requests to SCMI Virtio backend driver, and returns
                        the response from backend, over the virtqueue ring
                        buffers.

i. SCMI Virtio Backend  SCMI backend, which handles the incoming SCMI messages
                        from SCMI Vhost driver, and forwards them to the
                        backend protocols like clock and voltage protocols.
                        The backend protocols uses the host apis for those
                        resources like clock APIs provided by clock framework,
                        to vote/request for the resource. The response from
                        the host api is parceled into a SCMI response message,
                        and is returned to the SCMI Vhost driver. The SCMI
                        Vhost driver in turn, returns the reponse over the
                        Virtqueue reponse buffers.

j. Clock Framework      Clock framework on the host, which is used by
                        clients/drivers running on the host, to vote/request
                        for clocks.

k. Device B             Native driver running on host, which acts as a
                        consumer of one of the clocks.

l. Clock Driver         Underlying Clock driver, which programs the
                        corresponding hardware PLL, for a clock request, or
                        forwards the request to a SCP controller, over
                        SCMI channel between host and the controller.


2. SCMI Vhost and Virtio backend
--------------------------------

Below description provides information on few key aspects of handling SCMI
requests received over Virtio channel, at host.

2.1 VMM Support
---------------

VMM need to provide support for SCMI vhost device setup. Below
description outlines the steps which VMM need to do.

a. VMM invokes `open()` on `/dev/vhost-scmi`, when a new VM is started.

b. VMM calls below ioctls on the SCMI Vhost fd, for the VM, to setup
   Virtqueue ring buffers and eventfd, irqfd. P2A and SHARED_MEMORY
   SCMI features should not be set.

   ioctl(sdev->vhost_fd, VHOST_SET_OWNER);
   ioctl(sdev->vhost_fd, VHOST_GET_FEATURES, &features);
   ioctl(sdev->vhost_fd, VHOST_SET_FEATURES, &features);
   ioctl(sdev->vhost_fd, VHOST_SET_MEM_TABLE, mem);

   ioctl(sdev->vhost_fd, VHOST_SET_VRING_KICK, &file)
   ioctl(ndev->vhost_fd, VHOST_SET_VRING_CALL, &file)
   ioctl(sdev->vhost_fd, VHOST_SET_VRING_NUM, &state)
   ioctl(sdev->vhost_fd, VHOST_SET_VRING_BASE, &state)
   ioctl(sdev->vhost_fd, VHOST_SET_VRING_ADDR, &addr)

   ioctl(sdev->vhost_fd, VHOST_SCMI_SET_RUNNING, &on)

c. VMM invokes `close()` on the fd corresponding to `open()` call above
   for the VM, when that VM shuts down or crashes.

2.2 Client Handle
-------------------

Each guest VM client is identified using a client handle, which is
declared as below:


    1 struct scmi_vio_client_h {
    2     const void *handle;
    3 };

``->handle`` is an opaque pointer, which is initialized by the SCMI Vhost
driver for each guest VM, SCMI Virtio connection.

Client handles are allocated using ``scmi_vio_get_client_h()``
and freed using api ``scmi_vio_put_client_h()``.

``scmi_vio_get_client_h()`` encapsulates the ``scmi_vio_client_h``
handle in a ``scmi_vio_client_info`` structure, which is declared as:

::
    1 struct scmi_vio_client_info {
    2     struct scmi_vio_client_h client_h;
    3     void *priv;
    4 };

``->priv`` member provides a way for the next software layer (SCMI VIO
backend), to save per VM information, using apis -
``scmi_vio_set_client_priv()`` and ``scmi_vio_get_client_priv()``.
This information is used by the backend, to maintain bookkeeping
information for a VM - like the per protocol active requests for it.
This bookkeeping information can be used during VM teardown, to
release any requests/votes active for that VM.


Below is a pictorial representation of how the handle information is
mapped in each software component at host.


SCMI Vhost               SCMI VIO backend                   SCMI Backend Protocols

+----------------+      +----------------------+          +-->+---------------------+
|   *priv        |  +---| backend_protocol_map |          |   |    Protocol 0x10    |
+----------------+  |   +----------------------+          |   |    Client data      |
|  Client_h      |  |   | Client_h             |          |   +---------------------+
+----------------+  |   +----------------------+          |   |    Client_h         |
                    |                                     |   +---------------------+
                    |                                     |
                    |a. Backend stores an IDR map         |b. IDR member for a protocol
                    |   in the priv member                |   points to the private
                    |                                     |   data maintained by that
                    +-->+------------------------------+  |   protocol, for the client.
                        | 0x10   |  protocol_0x10-priv |--+
                        +------------------------------+
                        | 0x14   |  protocol_0x14-priv |----->+---------------------+
                        +------------------------------+      |     Protocol 0x14   |
                        |            ....              |      |     Client data     |
                        +------------------------------+      +---------------------+
                                                              |     Client_h        |
                                                              +---------------------+

2.3 Communication between Vhost and backend
-------------------------------------------

a. (Creation) During VM creation, VMM calls ``open()`` on the /dev/vhost-scmi
   node, which is exposed by the SCMI Vhost driver.

   As part of ``open`` call, SCMI Vhost driver initializes the host side
   Virtio interface for the guest VM. This initialization includes setup
   of:

   * Setting up Vhost virtqueus, tx/rx handler and registering those with
     the underlying Vhost framewrk.
   * Allocating a client handle for the VM.
   * ``scmi_vio_be_open()`` call to the SCMI Virtio backend driver.

   ``scmi_vio_be_open()`` initializes all active backend protocols as follows:

   * Allocates a new client handle, encapsulating the original client
     handle from Vhost layer.
   * Calls ``->open`` for the protocol, with the new handle allocated
     for that protocol. As part of the ``->open`` call, protocol callback
     stores its own bookkeeping information into the client handle's
     private data.
   * Allocates an IDR entry and stores the protocol-id -> protocol-client-handle
     mapping in the ``backend_protocol_map``.

b. (Message request handling) SCMI Vhost driver polls on the eventfd for a
   guest VM for any SCMI request messages. On incoming SCMI requests from
   the client Virtio (Guest VM), it does following:

   * Retrieve the request/response descriptor entries from the descriptor
     table, for the virqueues set up for the VM's SCMI Virtio transport.

   * Copy the request message from the descriptor entry's (addr, length)
     information into the request buffer maintained by Vhost for that VM,
     and forward the message to the client, by calling
     ``scmi_vio_be_request()`` with the client handle for the VM, and
     request and response buffers information.

   * ``scmi_vio_be_request()`` function, unpacks the message header
     from the request buffer, identifies the protocol, and forwards the
     request and response payload buffers to the protocol specific
     ``->msg_handle()``.

   * Backend Protocol layer calls the host side framework api to request
     the resource, like any other consumer driver running on the host.
     For ex. ``clock_prepare_enable()`` call, for the ``CLOCK_CONFIG_SET``
     clock protocol SCMI request message from the client. Return
     value from the host framework (``clock_prepare_enable()`` api in
     this example), is remapped to a SCMI status code, and returned to
     the SCMI VIO backend driver.

   * The Backend VIO driver packs the response status code and payload
     into the response buffer and returns from the ``scmi_vio_be_request()``
     call.

   * SCMI Vhost driver, on return from ``scmi_vio_be_request()`` call,
     copies the response buffer to the virtqueue descriptor (addr, length)
     entry for the response. It then signals the used entry to the vhost
     framework. This results in request completion interrupt signaling
     over irqfd to the VM.

c. (Teardown) As part of VM shutdown, VMM calls ``close()`` on the
   ``/dev/vhost-scmi`` file handle for the VM.

   ``->release()`` callback handler in SCMI vhost does following:

   * Flush all inflight requests for the VM.
   * Cleanup Vhost dev resources for the VM.
   * Call ``scmi_vio_be_close()`` with the client handle as argument.

   ``scmi_vio_be_close()`` does following cleanup:

   * Call ``->close`` for all active protocols, with the client handle
     in the IDR map, for that client-protocol mapping.
     As part of ``->close()` handler, protocol releases the resources
     for that VM. For example, unvoting any voted clocks, regulators.

   * Destroy the client handles for those protocols and free the
     private IDR map for the client.

Neeraj Upadhyay (3):
  dt-bindings: arm: Add document for SCMI Virtio backend device
  firmware: Add ARM SCMI Virtio backend implementation
  vhost/scmi: Add Host kernel accelerator for Virtio SCMI

 .../firmware/arm,scmi-vio-backend.yaml        |  85 +++
 drivers/firmware/Kconfig                      |   9 +
 drivers/firmware/arm_scmi/Makefile            |   1 +
 drivers/firmware/arm_scmi/base.c              |  12 -
 drivers/firmware/arm_scmi/common.h            |  29 +
 drivers/firmware/arm_scmi/msg.c               |  11 -
 drivers/firmware/arm_scmi/virtio.c            |   3 -
 .../firmware/arm_scmi/virtio_backend/Makefile |   5 +
 .../arm_scmi/virtio_backend/backend.c         | 516 ++++++++++++++++++
 .../arm_scmi/virtio_backend/backend.h         |  20 +
 .../virtio_backend/backend_protocol.h         |  93 ++++
 .../firmware/arm_scmi/virtio_backend/base.c   | 474 ++++++++++++++++
 .../arm_scmi/virtio_backend/client_handle.c   |  71 +++
 .../arm_scmi/virtio_backend/client_handle.h   |  24 +
 .../firmware/arm_scmi/virtio_backend/common.h |  53 ++
 drivers/vhost/Kconfig                         |  10 +
 drivers/vhost/Makefile                        |   3 +
 drivers/vhost/scmi.c                          | 466 ++++++++++++++++
 include/linux/scmi_vio_backend.h              |  31 ++
 include/uapi/linux/vhost.h                    |   3 +
 20 files changed, 1893 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/firmware/arm,scmi-vio-backend.yaml
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/Makefile
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/backend.c
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/backend.h
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/backend_protocol.h
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/base.c
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/client_handle.c
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/client_handle.h
 create mode 100644 drivers/firmware/arm_scmi/virtio_backend/common.h
 create mode 100644 drivers/vhost/scmi.c
 create mode 100644 include/linux/scmi_vio_backend.h

Comments

Cristian Marussi June 13, 2022, 5:20 p.m. UTC | #1

+CC: Souvik

On Thu, Jun 09, 2022 at 12:49:53PM +0530, Neeraj Upadhyay wrote:
> This RFC series, provides ARM System Control and Management Interface (SCMI)
> protocol backend implementation for Virtio transport. The purpose of this

Hi Neeraj,

Thanks for this work, I only glanced through the series at first to
grasp a general understanding of it (without goind into much details for
now) and I'd have a few questions/concerns that I'll noted down below.

I focused mainly on the backend server aims/functionalities/issues ignoring
at first the vhost-scmi entry-point since the vost-scmi accelerator is just
a (more-or-less) standard means of configuring and grabbing SCMI traffic
from the VMs into the Host Kernel and so I found more interesting at first
to understand what we can do with such traffic at first.
(IOW the vhost-scmi layer is welcome but remain to see what to do with it...)

> feature is to provide para-virtualized interfaces to guest VMs, to various
> hardware blocks like clocks, regulators. This allows the guest VMs to
> communicate their resource needs to the host, in the absence of direct
> access to those resources.

In an SCMI stack the agents (like VMs) issue requests to an SCMI platform
backend that is in charge of policying and armonizing such requests
eventually denying some of these (possibly malicious) while allowing others
(possibly armonizing/merging such reqs); with your solution basically the
SCMI backend in Kernel marshals/conveys all of such SCMI requests to the
proper Linux Kernel subsystem that is usually in charge of it, using
dedicated protocol handlers that basically translates SCMI requests to
Linux APIs calls to the Host. (I may have oversimplified or missed
something...)

At the price of a bit of overhead and code-duplication introduced by
this SCMI Backend you can indeed leverage the existing mechanisms for
resource accounting and sharing included in such Linux subsystems (like
Clock framework), and that's nice and useful, BUT how do you policy/filter
(possibly dinamically as VMs come and go) what these VMs can see and do
with these resources ?

... MORE importantly how do you protect the Host (or another VM) from
unacceptable (or possibly malicious) requests conveyed from one VM request
vqueue into the Linux subsystems (like clocks) ?

I saw you have added a good deal of DT bindings for the backend
describing protocols, so you could just expose only some protocols via
the backend (if I get it right) but you cannot anyway selectively expose
only a subset of resources to the different agents, so, if you expose the
clock protocol, that will be visible by any VMs and an agent could potentially
kill the Host or mount some clock related attack acting on the right clock.
(I mean you cannot describe in the Host DT a number X of clocks to be
supported by the Host Linux Clock framework BUT then expose selectively to
the SCMI agents only a subset Y < X to shield the Host from misbehaviour...
...at least not in a dynamic way avoiding to bake a fixed policy into
the backend...or maybe I'm missing how you can do that, in such a case
please explain...)

Moreover, in a normal SCMI stack the server resides out of reach from the
OSPM agents since the server, wherever it sits, has the last word and can
deny and block unreasonable/malicious requests while armonizing others: this
means the typical SCMI platform fw is configured in such a way that clearly
defines a set of policies to be enforced between the access of the various
agents. (and it can reside in the trusted codebase given its 'reduced'
size...even though this policies are probably at the moment not so
dynamically modificable there either...)

With your approach of a Linux Kernel based SCMI platform backend you are
certainly using all the good and well proven mechanisms offered by the
Kernel to share and co-ordinate access to such resources, which is good
(.. even though Linux is not so small in term of codebase to be used as
a TCB to tell the truth :D), BUT I don't see the same level of policying
or filtering applied anywhere in the proposed RFCs, especially to protect
the Host which at the end is supposed to use the same Linux subsystems and
possibly share some of those resources for its own needs.

I saw the Base protocol basic implementation you provided to expose the
supported backend protocols to the VMs, it would be useful to see how
you plan to handle something like the Clock protocol you mention in the
example below. (if you have Clock protocol backend that as WIP already
would be interesting to see it...)

Another issue/criticality that comes to my mind is how do you gather in
general basic resources states/descriptors from the existing Linux subsystems
(even leaving out any policying concerns): as an example, how do you gather
from the Host Clock framework the list of available clocks and their rates
descriptors that you're going expose to a specific VMs once this latter will
issue the related SCMI commands to get to know which SCMI Clock domain are
available ?
(...and I mean in a dynamic way not using a builtin per-platform baked set of
 resources known to be made available... I doubt that any sort of DT
 description would be accepted in this regards ...)

> 
> 1. Architecture overview
> ---------------------
> 
> Below diagram shows the overall software architecture of SCMI communication
> between guest VM and the host software. In this diagram, guest is a linux
> VM; also, host uses KVM linux.
> 
>          GUEST VM                   HOST
>  +--------------------+    +---------------------+    +--------------+
>  |   a. Device A      |    |   k. Device B       |    |      PLL     |
>  |  (Clock consumer)  |    |  (Clock consumer)   |    |              |
>  +--------------------+    +---------------------+    +--------------+
>           |                         |                         ^
>           v                         v                         |
>  +--------------------+    +---------------------+    +-----------------+
>  | b. Clock Framework |    | j. Clock Framework  | -->| l. Clock Driver |
>  +-- -----------------+    +---------------------+    +-----------------+
>           |                         ^
>           v                         |
>  +--------------------+    +------------------------+
>  |  c. SCMI Clock     |    | i. SCMI Virtio Backend |
>  +--------------------+    +------------------------+ 
>           |                         ^
>           v                         |
>  +--------------------+    +----------------------+
>  |  d. SCMI Virtio    |    |   h. SCMI Vhost      |<-----------+
>  +--------------------+    +----------------------+            |
>           |                         ^                          |
>           v                         |                          |
> +-------------------------------------------------+    +-----------------+
> |              e. Virtio Infra                    |    |    g. VMM       |
> +-------------------------------------------------+    +-----------------+
>           |                         ^                           ^
>           v                         |                           |
> +-------------------------------------------------+             |
> |                f. Hypervisor                    |-------------
> +-------------------------------------------------+
> 

Looking at the above schema and thinking out loud where any dynamic
policying against the resources can fit (..and trying desperately NOT to push
that into the Kernel too :P...) ... I think that XEN was trying something similar
(with a real backend SCMI platform FW at the end of the pipe though I think...) and
in their case the per-VMs resource allocation was performed using SCMI
BASE_SET_DEVICE_PERMISSIONS commands issued by the Hypervisor/VMM itself
I think or by a Dom0 elected as a trusted agent and so allowed to configure
such resource partitioning ...

https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg113868.html

...maybe a similar approach, with some sort of SCMI Trusted Agent living within
the VMM and in charge of directing such resources' partitioning between
VMs by issuing BASE_SET_DEVICE_PERMISSIONS towards the Kernel SCMI Virtio
Backend, could help keeping at least the policy bits related to the VMs out of
the kernel/DTs and possibly dynamically configurable following VMs lifecycle.

Even though, in our case ALL the resource management by device ID would have to
happen in the Kernel SCMI backend at the end, given that is where the SCMI
platform resides indeed, BUT at least you could keep the effective policy out of
kernel space, doing something like:

1. VMM/TrustedAgent query Kernel_SCMI_Virtio_backend for available resources

2. VMM/TrustedAg decides resources allocation between VMs (and/or possibly the Host
   based on some configured policy)

3. VMM/TrustedAgent issues BASE_SET_DEVICE_PERMISSIONS/PROTOCOLS to the
   Kernel_SCMI_Virtio_backend

4. Kernel_SCMI_Virtio_backend enforces resource partioning and sharing
   when processing subsequent VMs SCMI requests coming via Vhost-SCMI

...where the TrustedAgent here could be (I guess) the VMM or the Host or
both with different level of privilege if you don't want the VMM to be able
to configure resources access for the whole Host.

> a. Device A             This is the client kernel driver in guest VM,
>                         for ex. diplay driver, which uses standard
>                         clock framework APIs to vote for a clock.
> 
> b. Clock Framework      Underlying kernel clock framework on
>                         guest.
> 
> c. SCMI Clock           SCMI interface based clock driver.
> 
> d. SCMI Virtio          Underlying SCMI framework, using Virtio as
>                         transport driver.
> 
> e. Virtio Infra         Virtio drivers on guest VM. These drivers
>                         initiate virtqueue requests over Virtio
>                         transport (MMIO/PCI), and forwards response
>                         to SCMI Virtio registered callbacks.
> 
> f. Hypervisor           Hosted Hypervisor (KVM for ex.), which traps
>                         and forwards requests on virtqueue ring
>                         buffers to the VMM.
> 
> g. VMM                  Virtual Machine Monitor, running on host userspace,
>                         which manages the lifecycle of guest VMs, and forwards
>                         guest initiated virtqueue requests as IOCTLs to the
>                         Vhost driver on host.
> 
> h. SCMI Vhost           In kernel driver, which handles SCMI virtqueue
>                         requests from guest VMs. This driver forwards the
>                         requests to SCMI Virtio backend driver, and returns
>                         the response from backend, over the virtqueue ring
>                         buffers.
> 
> i. SCMI Virtio Backend  SCMI backend, which handles the incoming SCMI messages
>                         from SCMI Vhost driver, and forwards them to the
>                         backend protocols like clock and voltage protocols.
>                         The backend protocols uses the host apis for those
>                         resources like clock APIs provided by clock framework,
>                         to vote/request for the resource. The response from
>                         the host api is parceled into a SCMI response message,
>                         and is returned to the SCMI Vhost driver. The SCMI
>                         Vhost driver in turn, returns the reponse over the
>                         Virtqueue reponse buffers.
> 

Last but not least, this SCMI Virtio Backend layer in charge of
processing incoming SCMI packets, interfacing with the Linux subsystems
final backend and building SCMI replies from Linux will introduce a
certain level of code/funcs duplication given that this same SCMI basic
processing capabilities have been already baked in the SCMI stacks found in
SCP and in TF-A (.. and maybe a few other other proprietary backends)...

... but this is something maybe to be addressed in general in a
different context not something that can be addressed by this series.

Sorry for the usual flood of words :P ... I'll have a more in deep
review of the series in the next days, for now I wanted just to share my
concerns and (maybe wrong) understanding and see what you or Sudeep and
Souvik think about.

Thanks,
Cristian

Vincent Guittot June 30, 2022, 1:54 p.m. UTC | #2

Hi Neeraj and Cristian,

On Mon, 13 Jun 2022 at 19:20, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> +CC: Souvik
>
> On Thu, Jun 09, 2022 at 12:49:53PM +0530, Neeraj Upadhyay wrote:
> > This RFC series, provides ARM System Control and Management Interface (SCMI)
> > protocol backend implementation for Virtio transport. The purpose of this
>
> Hi Neeraj,
>
> Thanks for this work, I only glanced through the series at first to
> grasp a general understanding of it (without goind into much details for
> now) and I'd have a few questions/concerns that I'll noted down below.
>
> I focused mainly on the backend server aims/functionalities/issues ignoring
> at first the vhost-scmi entry-point since the vost-scmi accelerator is just
> a (more-or-less) standard means of configuring and grabbing SCMI traffic
> from the VMs into the Host Kernel and so I found more interesting at first
> to understand what we can do with such traffic at first.
> (IOW the vhost-scmi layer is welcome but remain to see what to do with it...)
>
> > feature is to provide para-virtualized interfaces to guest VMs, to various
> > hardware blocks like clocks, regulators. This allows the guest VMs to
> > communicate their resource needs to the host, in the absence of direct
> > access to those resources.

IIUC, you want to leverage on the drivers already developed in the
kernel for those resources instead of developing a dedicated SCMI
server. The main concern is that this also provides full access to
these resources from userspace without any control which is a concern
also described by Cristian below. It would be good to describe how you
want to manage resources availability and permission access.This is
the main open point in your RFC so far

>
> In an SCMI stack the agents (like VMs) issue requests to an SCMI platform
> backend that is in charge of policying and armonizing such requests
> eventually denying some of these (possibly malicious) while allowing others
> (possibly armonizing/merging such reqs); with your solution basically the
> SCMI backend in Kernel marshals/conveys all of such SCMI requests to the
> proper Linux Kernel subsystem that is usually in charge of it, using
> dedicated protocol handlers that basically translates SCMI requests to
> Linux APIs calls to the Host. (I may have oversimplified or missed
> something...)
>
> At the price of a bit of overhead and code-duplication introduced by
> this SCMI Backend you can indeed leverage the existing mechanisms for
> resource accounting and sharing included in such Linux subsystems (like
> Clock framework), and that's nice and useful, BUT how do you policy/filter
> (possibly dinamically as VMs come and go) what these VMs can see and do
> with these resources ?
>
> ... MORE importantly how do you protect the Host (or another VM) from
> unacceptable (or possibly malicious) requests conveyed from one VM request
> vqueue into the Linux subsystems (like clocks) ?
>
> I saw you have added a good deal of DT bindings for the backend
> describing protocols, so you could just expose only some protocols via
> the backend (if I get it right) but you cannot anyway selectively expose
> only a subset of resources to the different agents, so, if you expose the
> clock protocol, that will be visible by any VMs and an agent could potentially
> kill the Host or mount some clock related attack acting on the right clock.
> (I mean you cannot describe in the Host DT a number X of clocks to be
> supported by the Host Linux Clock framework BUT then expose selectively to
> the SCMI agents only a subset Y < X to shield the Host from misbehaviour...
> ...at least not in a dynamic way avoiding to bake a fixed policy into
> the backend...or maybe I'm missing how you can do that, in such a case
> please explain...)
>
> Moreover, in a normal SCMI stack the server resides out of reach from the
> OSPM agents since the server, wherever it sits, has the last word and can
> deny and block unreasonable/malicious requests while armonizing others: this
> means the typical SCMI platform fw is configured in such a way that clearly
> defines a set of policies to be enforced between the access of the various
> agents. (and it can reside in the trusted codebase given its 'reduced'
> size...even though this policies are probably at the moment not so
> dynamically modificable there either...)
>
> With your approach of a Linux Kernel based SCMI platform backend you are
> certainly using all the good and well proven mechanisms offered by the
> Kernel to share and co-ordinate access to such resources, which is good
> (.. even though Linux is not so small in term of codebase to be used as
> a TCB to tell the truth :D), BUT I don't see the same level of policying
> or filtering applied anywhere in the proposed RFCs, especially to protect
> the Host which at the end is supposed to use the same Linux subsystems and
> possibly share some of those resources for its own needs.
>
> I saw the Base protocol basic implementation you provided to expose the
> supported backend protocols to the VMs, it would be useful to see how
> you plan to handle something like the Clock protocol you mention in the
> example below. (if you have Clock protocol backend that as WIP already
> would be interesting to see it...)
>
> Another issue/criticality that comes to my mind is how do you gather in
> general basic resources states/descriptors from the existing Linux subsystems
> (even leaving out any policying concerns): as an example, how do you gather
> from the Host Clock framework the list of available clocks and their rates
> descriptors that you're going expose to a specific VMs once this latter will
> issue the related SCMI commands to get to know which SCMI Clock domain are
> available ?
> (...and I mean in a dynamic way not using a builtin per-platform baked set of
>  resources known to be made available... I doubt that any sort of DT
>  description would be accepted in this regards ...)
>
> >
> > 1. Architecture overview
> > ---------------------
> >
> > Below diagram shows the overall software architecture of SCMI communication
> > between guest VM and the host software. In this diagram, guest is a linux
> > VM; also, host uses KVM linux.
> >
> >          GUEST VM                   HOST
> >  +--------------------+    +---------------------+    +--------------+
> >  |   a. Device A      |    |   k. Device B       |    |      PLL     |
> >  |  (Clock consumer)  |    |  (Clock consumer)   |    |              |
> >  +--------------------+    +---------------------+    +--------------+
> >           |                         |                         ^
> >           v                         v                         |
> >  +--------------------+    +---------------------+    +-----------------+
> >  | b. Clock Framework |    | j. Clock Framework  | -->| l. Clock Driver |
> >  +-- -----------------+    +---------------------+    +-----------------+
> >           |                         ^
> >           v                         |
> >  +--------------------+    +------------------------+
> >  |  c. SCMI Clock     |    | i. SCMI Virtio Backend |
> >  +--------------------+    +------------------------+
> >           |                         ^
> >           v                         |
> >  +--------------------+    +----------------------+
> >  |  d. SCMI Virtio    |    |   h. SCMI Vhost      |<-----------+
> >  +--------------------+    +----------------------+            |
> >           |                         ^                          |
> >           v                         |                          |
> > +-------------------------------------------------+    +-----------------+
> > |              e. Virtio Infra                    |    |    g. VMM       |
> > +-------------------------------------------------+    +-----------------+
> >           |                         ^                           ^
> >           v                         |                           |
> > +-------------------------------------------------+             |
> > |                f. Hypervisor                    |-------------
> > +-------------------------------------------------+
> >
>
> Looking at the above schema and thinking out loud where any dynamic
> policying against the resources can fit (..and trying desperately NOT to push
> that into the Kernel too :P...) ... I think that XEN was trying something similar
> (with a real backend SCMI platform FW at the end of the pipe though I think...) and
> in their case the per-VMs resource allocation was performed using SCMI
> BASE_SET_DEVICE_PERMISSIONS commands issued by the Hypervisor/VMM itself
> I think or by a Dom0 elected as a trusted agent and so allowed to configure
> such resource partitioning ...
>
> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg113868.html
>
> ...maybe a similar approach, with some sort of SCMI Trusted Agent living within
> the VMM and in charge of directing such resources' partitioning between
> VMs by issuing BASE_SET_DEVICE_PERMISSIONS towards the Kernel SCMI Virtio
> Backend, could help keeping at least the policy bits related to the VMs out of
> the kernel/DTs and possibly dynamically configurable following VMs lifecycle.
>
> Even though, in our case ALL the resource management by device ID would have to
> happen in the Kernel SCMI backend at the end, given that is where the SCMI
> platform resides indeed, BUT at least you could keep the effective policy out of
> kernel space, doing something like:
>
> 1. VMM/TrustedAgent query Kernel_SCMI_Virtio_backend for available resources
>
> 2. VMM/TrustedAg decides resources allocation between VMs (and/or possibly the Host
>    based on some configured policy)
>
> 3. VMM/TrustedAgent issues BASE_SET_DEVICE_PERMISSIONS/PROTOCOLS to the
>    Kernel_SCMI_Virtio_backend
>
> 4. Kernel_SCMI_Virtio_backend enforces resource partioning and sharing
>    when processing subsequent VMs SCMI requests coming via Vhost-SCMI
>
> ...where the TrustedAgent here could be (I guess) the VMM or the Host or
> both with different level of privilege if you don't want the VMM to be able
> to configure resources access for the whole Host.
>
> > a. Device A             This is the client kernel driver in guest VM,
> >                         for ex. diplay driver, which uses standard
> >                         clock framework APIs to vote for a clock.
> >
> > b. Clock Framework      Underlying kernel clock framework on
> >                         guest.
> >
> > c. SCMI Clock           SCMI interface based clock driver.
> >
> > d. SCMI Virtio          Underlying SCMI framework, using Virtio as
> >                         transport driver.
> >
> > e. Virtio Infra         Virtio drivers on guest VM. These drivers
> >                         initiate virtqueue requests over Virtio
> >                         transport (MMIO/PCI), and forwards response
> >                         to SCMI Virtio registered callbacks.
> >
> > f. Hypervisor           Hosted Hypervisor (KVM for ex.), which traps
> >                         and forwards requests on virtqueue ring
> >                         buffers to the VMM.
> >
> > g. VMM                  Virtual Machine Monitor, running on host userspace,
> >                         which manages the lifecycle of guest VMs, and forwards
> >                         guest initiated virtqueue requests as IOCTLs to the
> >                         Vhost driver on host.
> >
> > h. SCMI Vhost           In kernel driver, which handles SCMI virtqueue
> >                         requests from guest VMs. This driver forwards the
> >                         requests to SCMI Virtio backend driver, and returns
> >                         the response from backend, over the virtqueue ring
> >                         buffers.
> >
> > i. SCMI Virtio Backend  SCMI backend, which handles the incoming SCMI messages
> >                         from SCMI Vhost driver, and forwards them to the
> >                         backend protocols like clock and voltage protocols.
> >                         The backend protocols uses the host apis for those
> >                         resources like clock APIs provided by clock framework,
> >                         to vote/request for the resource. The response from
> >                         the host api is parceled into a SCMI response message,
> >                         and is returned to the SCMI Vhost driver. The SCMI
> >                         Vhost driver in turn, returns the reponse over the
> >                         Virtqueue reponse buffers.
> >
>
> Last but not least, this SCMI Virtio Backend layer in charge of
> processing incoming SCMI packets, interfacing with the Linux subsystems
> final backend and building SCMI replies from Linux will introduce a
> certain level of code/funcs duplication given that this same SCMI basic
> processing capabilities have been already baked in the SCMI stacks found in
> SCP and in TF-A (.. and maybe a few other other proprietary backends)...
>
> ... but this is something maybe to be addressed in general in a
> different context not something that can be addressed by this series.
>
> Sorry for the usual flood of words :P ... I'll have a more in deep
> review of the series in the next days, for now I wanted just to share my
> concerns and (maybe wrong) understanding and see what you or Sudeep and
> Souvik think about.
>
> Thanks,
> Cristian
>

Mike Tipton July 9, 2022, 3:28 a.m. UTC | #3

I'll let Neeraj respond to more of the core backend details and policy 
enforcement options, but I can provide some details for our prototype 
clock protocol handler. Note that it's a pretty simple proof-of-concept 
handler that's implemented entirely outside of the common clock 
framework. It operates as just another client to the framework. This 
approach has some limitations. And a more full-featured implementation 
could benefit from being implemented in the clock framework itself. But 
that level of support hasn't been necessary for our purposes yet.

On 6/13/2022 10:20 AM, Cristian Marussi wrote:
> +CC: Souvik
> 
> On Thu, Jun 09, 2022 at 12:49:53PM +0530, Neeraj Upadhyay wrote:
>> This RFC series, provides ARM System Control and Management Interface (SCMI)
>> protocol backend implementation for Virtio transport. The purpose of this
> 
> Hi Neeraj,
> 
> Thanks for this work, I only glanced through the series at first to
> grasp a general understanding of it (without goind into much details for
> now) and I'd have a few questions/concerns that I'll noted down below.
> 
> I focused mainly on the backend server aims/functionalities/issues ignoring
> at first the vhost-scmi entry-point since the vost-scmi accelerator is just
> a (more-or-less) standard means of configuring and grabbing SCMI traffic
> from the VMs into the Host Kernel and so I found more interesting at first
> to understand what we can do with such traffic at first.
> (IOW the vhost-scmi layer is welcome but remain to see what to do with it...)
> 
>> feature is to provide para-virtualized interfaces to guest VMs, to various
>> hardware blocks like clocks, regulators. This allows the guest VMs to
>> communicate their resource needs to the host, in the absence of direct
>> access to those resources.
> 
> In an SCMI stack the agents (like VMs) issue requests to an SCMI platform
> backend that is in charge of policying and armonizing such requests
> eventually denying some of these (possibly malicious) while allowing others
> (possibly armonizing/merging such reqs); with your solution basically the
> SCMI backend in Kernel marshals/conveys all of such SCMI requests to the
> proper Linux Kernel subsystem that is usually in charge of it, using
> dedicated protocol handlers that basically translates SCMI requests to
> Linux APIs calls to the Host. (I may have oversimplified or missed
> something...)
> 
> At the price of a bit of overhead and code-duplication introduced by
> this SCMI Backend you can indeed leverage the existing mechanisms for
> resource accounting and sharing included in such Linux subsystems (like
> Clock framework), and that's nice and useful, BUT how do you policy/filter
> (possibly dinamically as VMs come and go) what these VMs can see and do
> with these resources ?
> 

Currently, our only level of filtering is for which clocks we choose to 
expose over SCMI. Those chosen clocks are exposed to all VMs equally. 
The clock protocol handler exposes a registration function, which we 
call from our clock drivers. Which clocks we register are currently 
hardcoded in the drivers themselves. We often want to register all the 
clocks in a given driver, since we have separate drivers for each clock 
controller and many clock controllers are already dedicated to a 
particular core or subsystem. So if that core or subsystem needs to be 
controlled by a VM, then we give the VM all of its clocks. This can mean 
exposing a large number of clocks (in the hundreds).


> ... MORE importantly how do you protect the Host (or another VM) from
> unacceptable (or possibly malicious) requests conveyed from one VM request
> vqueue into the Linux subsystems (like clocks) ?
> 

The clock protocol handler tracks its own reference counts for each 
clock that's been registered with it. It'll only enable clocks through 
the host framework when the reference count increases from 0 -> 1, and 
it'll only disable clocks through host framework when the reference 
count decreases from 1 -> 0. And since the clock framework has its own 
internal reference counts, then it's not possible for a VM to disable 
clocks that the host itself has enabled.

We don't support frequency aggregation, so a VM could override the 
frequency request of another VM or of the host. We could support max 
aggregation across VMs, so that a VM couldn't reduce the frequency below 
what another VM has requested. But without clock framework changes, we 
can't aggregate with the local host clients. So a VM could reduce the 
frequency below what the host has requested.

Generally speaking we don't expect more than one entity (VM or host) to 
control a given clock at a time. But all we can currently enforce is 
that clocks only turn off when *all* entities (including the the host) 
request them to be off.


> I saw you have added a good deal of DT bindings for the backend
> describing protocols, so you could just expose only some protocols via
> the backend (if I get it right) but you cannot anyway selectively expose
> only a subset of resources to the different agents, so, if you expose the
> clock protocol, that will be visible by any VMs and an agent could potentially
> kill the Host or mount some clock related attack acting on the right clock.
> (I mean you cannot describe in the Host DT a number X of clocks to be
> supported by the Host Linux Clock framework BUT then expose selectively to
> the SCMI agents only a subset Y < X to shield the Host from misbehaviour...
> ...at least not in a dynamic way avoiding to bake a fixed policy into
> the backend...or maybe I'm missing how you can do that, in such a case
> please explain...)
> 
> Moreover, in a normal SCMI stack the server resides out of reach from the
> OSPM agents since the server, wherever it sits, has the last word and can
> deny and block unreasonable/malicious requests while armonizing others: this
> means the typical SCMI platform fw is configured in such a way that clearly
> defines a set of policies to be enforced between the access of the various
> agents. (and it can reside in the trusted codebase given its 'reduced'
> size...even though this policies are probably at the moment not so
> dynamically modificable there either...)
> 
> With your approach of a Linux Kernel based SCMI platform backend you are
> certainly using all the good and well proven mechanisms offered by the
> Kernel to share and co-ordinate access to such resources, which is good
> (.. even though Linux is not so small in term of codebase to be used as
> a TCB to tell the truth :D), BUT I don't see the same level of policying
> or filtering applied anywhere in the proposed RFCs, especially to protect
> the Host which at the end is supposed to use the same Linux subsystems and
> possibly share some of those resources for its own needs.
> 
> I saw the Base protocol basic implementation you provided to expose the
> supported backend protocols to the VMs, it would be useful to see how
> you plan to handle something like the Clock protocol you mention in the
> example below. (if you have Clock protocol backend that as WIP already
> would be interesting to see it...)
> 
> Another issue/criticality that comes to my mind is how do you gather in
> general basic resources states/descriptors from the existing Linux subsystems
> (even leaving out any policying concerns): as an example, how do you gather
> from the Host Clock framework the list of available clocks and their rates
> descriptors that you're going expose to a specific VMs once this latter will
> issue the related SCMI commands to get to know which SCMI Clock domain are
> available ?
> (...and I mean in a dynamic way not using a builtin per-platform baked set of
>   resources known to be made available... I doubt that any sort of DT
>   description would be accepted in this regards ...)
> 

As mentioned, the list of clocks we choose to expose are currently 
hardcoded in the clock drivers outside of the clock framework. There is 
no dynamic policy in place.

For supported rates, we currently just implement the 
CLOCK_DESCRIBE_RATES command using rate ranges, rather than lists of 
discrete rates (num_rates_flags[12] = 1). And we just communicate the 
full u32 range 0..U32_MAX with step_size=1. We do this for simplicity. 
Many of our clocks only support a small list of discrete rates (though 
some support large ranges). If a VM requests a rate not aligned to these 
discrete rates, then we'll just round up to what the host supports. We 
currently operate under the assumption that the VM knows what it needs 
and doesn't need to query the specific supported rates from the host. 
That's fine for our current use cases, at least. Publishing 
clock-specific rate lists and/or proper ranges would be more complicated 
and require some amount of clock framework changes to get this information.


>>
>> 1. Architecture overview
>> ---------------------
>>
>> Below diagram shows the overall software architecture of SCMI communication
>> between guest VM and the host software. In this diagram, guest is a linux
>> VM; also, host uses KVM linux.
>>
>>           GUEST VM                   HOST
>>   +--------------------+    +---------------------+    +--------------+
>>   |   a. Device A      |    |   k. Device B       |    |      PLL     |
>>   |  (Clock consumer)  |    |  (Clock consumer)   |    |              |
>>   +--------------------+    +---------------------+    +--------------+
>>            |                         |                         ^
>>            v                         v                         |
>>   +--------------------+    +---------------------+    +-----------------+
>>   | b. Clock Framework |    | j. Clock Framework  | -->| l. Clock Driver |
>>   +-- -----------------+    +---------------------+    +-----------------+
>>            |                         ^
>>            v                         |
>>   +--------------------+    +------------------------+
>>   |  c. SCMI Clock     |    | i. SCMI Virtio Backend |
>>   +--------------------+    +------------------------+
>>            |                         ^
>>            v                         |
>>   +--------------------+    +----------------------+
>>   |  d. SCMI Virtio    |    |   h. SCMI Vhost      |<-----------+
>>   +--------------------+    +----------------------+            |
>>            |                         ^                          |
>>            v                         |                          |
>> +-------------------------------------------------+    +-----------------+
>> |              e. Virtio Infra                    |    |    g. VMM       |
>> +-------------------------------------------------+    +-----------------+
>>            |                         ^                           ^
>>            v                         |                           |
>> +-------------------------------------------------+             |
>> |                f. Hypervisor                    |-------------
>> +-------------------------------------------------+
>>
> 
> Looking at the above schema and thinking out loud where any dynamic
> policying against the resources can fit (..and trying desperately NOT to push
> that into the Kernel too :P...) ... I think that XEN was trying something similar
> (with a real backend SCMI platform FW at the end of the pipe though I think...) and
> in their case the per-VMs resource allocation was performed using SCMI
> BASE_SET_DEVICE_PERMISSIONS commands issued by the Hypervisor/VMM itself
> I think or by a Dom0 elected as a trusted agent and so allowed to configure
> such resource partitioning ...
> 
> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg113868.html
> 
> ...maybe a similar approach, with some sort of SCMI Trusted Agent living within
> the VMM and in charge of directing such resources' partitioning between
> VMs by issuing BASE_SET_DEVICE_PERMISSIONS towards the Kernel SCMI Virtio
> Backend, could help keeping at least the policy bits related to the VMs out of
> the kernel/DTs and possibly dynamically configurable following VMs lifecycle.
> 
> Even though, in our case ALL the resource management by device ID would have to
> happen in the Kernel SCMI backend at the end, given that is where the SCMI
> platform resides indeed, BUT at least you could keep the effective policy out of
> kernel space, doing something like:
> 
> 1. VMM/TrustedAgent query Kernel_SCMI_Virtio_backend for available resources
> 
> 2. VMM/TrustedAg decides resources allocation between VMs (and/or possibly the Host
>     based on some configured policy)
> 
> 3. VMM/TrustedAgent issues BASE_SET_DEVICE_PERMISSIONS/PROTOCOLS to the
>     Kernel_SCMI_Virtio_backend
> 
> 4. Kernel_SCMI_Virtio_backend enforces resource partioning and sharing
>     when processing subsequent VMs SCMI requests coming via Vhost-SCMI
> 
> ...where the TrustedAgent here could be (I guess) the VMM or the Host or
> both with different level of privilege if you don't want the VMM to be able
> to configure resources access for the whole Host.
> 
>> a. Device A             This is the client kernel driver in guest VM,
>>                          for ex. diplay driver, which uses standard
>>                          clock framework APIs to vote for a clock.
>>
>> b. Clock Framework      Underlying kernel clock framework on
>>                          guest.
>>
>> c. SCMI Clock           SCMI interface based clock driver.
>>
>> d. SCMI Virtio          Underlying SCMI framework, using Virtio as
>>                          transport driver.
>>
>> e. Virtio Infra         Virtio drivers on guest VM. These drivers
>>                          initiate virtqueue requests over Virtio
>>                          transport (MMIO/PCI), and forwards response
>>                          to SCMI Virtio registered callbacks.
>>
>> f. Hypervisor           Hosted Hypervisor (KVM for ex.), which traps
>>                          and forwards requests on virtqueue ring
>>                          buffers to the VMM.
>>
>> g. VMM                  Virtual Machine Monitor, running on host userspace,
>>                          which manages the lifecycle of guest VMs, and forwards
>>                          guest initiated virtqueue requests as IOCTLs to the
>>                          Vhost driver on host.
>>
>> h. SCMI Vhost           In kernel driver, which handles SCMI virtqueue
>>                          requests from guest VMs. This driver forwards the
>>                          requests to SCMI Virtio backend driver, and returns
>>                          the response from backend, over the virtqueue ring
>>                          buffers.
>>
>> i. SCMI Virtio Backend  SCMI backend, which handles the incoming SCMI messages
>>                          from SCMI Vhost driver, and forwards them to the
>>                          backend protocols like clock and voltage protocols.
>>                          The backend protocols uses the host apis for those
>>                          resources like clock APIs provided by clock framework,
>>                          to vote/request for the resource. The response from
>>                          the host api is parceled into a SCMI response message,
>>                          and is returned to the SCMI Vhost driver. The SCMI
>>                          Vhost driver in turn, returns the reponse over the
>>                          Virtqueue reponse buffers.
>>
> 
> Last but not least, this SCMI Virtio Backend layer in charge of
> processing incoming SCMI packets, interfacing with the Linux subsystems
> final backend and building SCMI replies from Linux will introduce a
> certain level of code/funcs duplication given that this same SCMI basic
> processing capabilities have been already baked in the SCMI stacks found in
> SCP and in TF-A (.. and maybe a few other other proprietary backends)...
> 
> ... but this is something maybe to be addressed in general in a
> different context not something that can be addressed by this series.
> 
> Sorry for the usual flood of words :P ... I'll have a more in deep
> review of the series in the next days, for now I wanted just to share my
> concerns and (maybe wrong) understanding and see what you or Sudeep and
> Souvik think about.
> 
> Thanks,
> Cristian
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Neeraj Upadhyay July 27, 2022, 4:17 a.m. UTC | #4

Hi Cristian,

Thanks for your feedback! Sorry, it took long before replying. Few 
thoughts inline to your comments.

On 6/13/2022 10:50 PM, Cristian Marussi wrote:
> +CC: Souvik
> 
> On Thu, Jun 09, 2022 at 12:49:53PM +0530, Neeraj Upadhyay wrote:
>> This RFC series, provides ARM System Control and Management Interface (SCMI)
>> protocol backend implementation for Virtio transport. The purpose of this
> 
> Hi Neeraj,
> 
> Thanks for this work, I only glanced through the series at first to
> grasp a general understanding of it (without goind into much details for
> now) and I'd have a few questions/concerns that I'll noted down below.
> 
> I focused mainly on the backend server aims/functionalities/issues ignoring
> at first the vhost-scmi entry-point since the vost-scmi accelerator is just
> a (more-or-less) standard means of configuring and grabbing SCMI traffic
> from the VMs into the Host Kernel and so I found more interesting at first
> to understand what we can do with such traffic at first.
> (IOW the vhost-scmi layer is welcome but remain to see what to do with it...)
> 
>> feature is to provide para-virtualized interfaces to guest VMs, to various
>> hardware blocks like clocks, regulators. This allows the guest VMs to
>> communicate their resource needs to the host, in the absence of direct
>> access to those resources.
> 
> In an SCMI stack the agents (like VMs) issue requests to an SCMI platform
> backend that is in charge of policying and armonizing such requests
> eventually denying some of these (possibly malicious) while allowing others
> (possibly armonizing/merging such reqs); with your solution basically the
> SCMI backend in Kernel marshals/conveys all of such SCMI requests to the
> proper Linux Kernel subsystem that is usually in charge of it, using
> dedicated protocol handlers that basically translates SCMI requests to
> Linux APIs calls to the Host. (I may have oversimplified or missed
> something...)
> 
> At the price of a bit of overhead and code-duplication introduced by
> this SCMI Backend you can indeed leverage the existing mechanisms for
> resource accounting and sharing included in such Linux subsystems (like
> Clock framework), and that's nice and useful, BUT how do you policy/filter
> (possibly dinamically as VMs come and go) what these VMs can see and do
> with these resources ?
> 
> ... MORE importantly how do you protect the Host (or another VM) from
> unacceptable (or possibly malicious) requests conveyed from one VM request
> vqueue into the Linux subsystems (like clocks) ?
> 
> I saw you have added a good deal of DT bindings for the backend
> describing protocols, so you could just expose only some protocols via
> the backend (if I get it right) but you cannot anyway selectively expose
> only a subset of resources to the different agents, so, if you expose the
> clock protocol, that will be visible by any VMs and an agent could potentially
> kill the Host or mount some clock related attack acting on the right clock.
> (I mean you cannot describe in the Host DT a number X of clocks to be
> supported by the Host Linux Clock framework BUT then expose selectively to
> the SCMI agents only a subset Y < X to shield the Host from misbehaviour...
> ...at least not in a dynamic way avoiding to bake a fixed policy into
> the backend...or maybe I'm missing how you can do that, in such a case
> please explain...)
> 
> Moreover, in a normal SCMI stack the server resides out of reach from the
> OSPM agents since the server, wherever it sits, has the last word and can
> deny and block unreasonable/malicious requests while armonizing others: this
> means the typical SCMI platform fw is configured in such a way that clearly
> defines a set of policies to be enforced between the access of the various
> agents. (and it can reside in the trusted codebase given its 'reduced'
> size...even though this policies are probably at the moment not so
> dynamically modificable there either...)
> 
> With your approach of a Linux Kernel based SCMI platform backend you are
> certainly using all the good and well proven mechanisms offered by the
> Kernel to share and co-ordinate access to such resources, which is good
> (.. even though Linux is not so small in term of codebase to be used as
> a TCB to tell the truth :D), BUT I don't see the same level of policying
> or filtering applied anywhere in the proposed RFCs, especially to protect
> the Host which at the end is supposed to use the same Linux subsystems and
> possibly share some of those resources for its own needs.
> 
> I saw the Base protocol basic implementation you provided to expose the
> supported backend protocols to the VMs, it would be useful to see how
> you plan to handle something like the Clock protocol you mention in the
> example below. (if you have Clock protocol backend that as WIP already
> would be interesting to see it...) >
> Another issue/criticality that comes to my mind is how do you gather in
> general basic resources states/descriptors from the existing Linux subsystems
> (even leaving out any policying concerns): as an example, how do you gather
> from the Host Clock framework the list of available clocks and their rates
> descriptors that you're going expose to a specific VMs once this latter will
> issue the related SCMI commands to get to know which SCMI Clock domain are
> available ?
> (...and I mean in a dynamic way not using a builtin per-platform baked set of
>   resources known to be made available... I doubt that any sort of DT
>   description would be accepted in this regards ...)
> 
>>
>> 1. Architecture overview
>> ---------------------
>>
>> Below diagram shows the overall software architecture of SCMI communication
>> between guest VM and the host software. In this diagram, guest is a linux
>> VM; also, host uses KVM linux.
>>
>>           GUEST VM                   HOST
>>   +--------------------+    +---------------------+    +--------------+
>>   |   a. Device A      |    |   k. Device B       |    |      PLL     |
>>   |  (Clock consumer)  |    |  (Clock consumer)   |    |              |
>>   +--------------------+    +---------------------+    +--------------+
>>            |                         |                         ^
>>            v                         v                         |
>>   +--------------------+    +---------------------+    +-----------------+
>>   | b. Clock Framework |    | j. Clock Framework  | -->| l. Clock Driver |
>>   +-- -----------------+    +---------------------+    +-----------------+
>>            |                         ^
>>            v                         |
>>   +--------------------+    +------------------------+
>>   |  c. SCMI Clock     |    | i. SCMI Virtio Backend |
>>   +--------------------+    +------------------------+
>>            |                         ^
>>            v                         |
>>   +--------------------+    +----------------------+
>>   |  d. SCMI Virtio    |    |   h. SCMI Vhost      |<-----------+
>>   +--------------------+    +----------------------+            |
>>            |                         ^                          |
>>            v                         |                          |
>> +-------------------------------------------------+    +-----------------+
>> |              e. Virtio Infra                    |    |    g. VMM       |
>> +-------------------------------------------------+    +-----------------+
>>            |                         ^                           ^
>>            v                         |                           |
>> +-------------------------------------------------+             |
>> |                f. Hypervisor                    |-------------
>> +-------------------------------------------------+
>>
> 
> Looking at the above schema and thinking out loud where any dynamic
> policying against the resources can fit (..and trying desperately NOT to push
> that into the Kernel too :P...) ... I think that XEN was trying something similar
> (with a real backend SCMI platform FW at the end of the pipe though I think...) and
> in their case the per-VMs resource allocation was performed using SCMI
> BASE_SET_DEVICE_PERMISSIONS commands issued by the Hypervisor/VMM itself
> I think or by a Dom0 elected as a trusted agent and so allowed to configure
> such resource partitioning ...
> 
> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg113868.html
> 
> ...maybe a similar approach, with some sort of SCMI Trusted Agent living within
> the VMM and in charge of directing such resources' partitioning between
> VMs by issuing BASE_SET_DEVICE_PERMISSIONS towards the Kernel SCMI Virtio
> Backend, could help keeping at least the policy bits related to the VMs out of
> the kernel/DTs and possibly dynamically configurable following VMs lifecycle.
> 
> Even though, in our case ALL the resource management by device ID would have to
> happen in the Kernel SCMI backend at the end, given that is where the SCMI
> platform resides indeed, BUT at least you could keep the effective policy out of
> kernel space, doing something like:
> 
> 1. VMM/TrustedAgent query Kernel_SCMI_Virtio_backend for available resources
> 
> 2. VMM/TrustedAg decides resources allocation between VMs (and/or possibly the Host
>     based on some configured policy)
> 
> 3. VMM/TrustedAgent issues BASE_SET_DEVICE_PERMISSIONS/PROTOCOLS to the
>     Kernel_SCMI_Virtio_backend
> 
> 4. Kernel_SCMI_Virtio_backend enforces resource partioning and sharing
>     when processing subsequent VMs SCMI requests coming via Vhost-SCMI
> 
> ...where the TrustedAgent here could be (I guess) the VMM or the Host or
> both with different level of privilege if you don't want the VMM to be able
> to configure resources access for the whole Host.
> 

Thanks for sharing your thoughts on this. Some thoughts on this:

One of the challenges in device ID based resource management appears to 
be, mapping these devices to SCMI protocol resources (clocks, 
regulators), and providing a means for VMM/TrustedAgent(userspace) to 
query and identify devices (to maintain policy information) and request 
for those SCMI devices for each VM.


As SCMI spec does not cover the discovery of device ids and how they
are mapped to protocol resources likes clocks and voltage ids.

Going though previous discussions (thanks Vincent for sharing this 
link!) [1] , looks like there has been discussions around similar 
concepts, where device node contains <vendor>,scmi_devid device 
property, to map a device to the corresponding SCMI device. Those
discussions also mention about some ongoing work in the SCMI spec,
on device-id. Putting some of thoughts here, on managing device IDs
in Kernel_SCMI_Virtio_backend. Looking for inputs on this.


1. Device representation in device tree

Alternative 1

Add arm,scmi-devid property to device nodes, similar to the approach in 
[1]. Device management software component of Kernel_SCMI_Virtio_backend 
parses device tree to get information about these devices and map them 
to protocol resources, by checking the "clocks", "-supply" regulator 
nodes and finding the corresponding scmi clock / voltage ID for them.

With this approach, we would also need to maintain some name (using 
arm,scmi-devname) in addition to the id for each node? One problem
with this approach looks to be, device ids are not maintained
at a centralized place and spread across the device nodes. How do we
assign these ids to various nodes i.e. what is the correct device id
for lets say usb node and how this can be enforced? Maybe we do not
need to maintain device id in device tree and only maintain 
arm,scmi-devname, and Device management sw component dynamically assigns an
incremental device ID to each device node, which has arm,scmi-devname 
property. However,this means device ID for a node is not fixed and 
device policy need to use device names, which might be difficult to 
maintain?

Another problem looks to be tight coupling between the resource 
properties in a device node and its corresponding SCMI device. Parsing 
the specific resource properties like "clocks", "-supply" might become 
cumbersome (we would need to identify which property, and its
representation for each resource provided by SCMI protocol) to extend to 
other resources? What if we want to map SCMI device to only a subset of 
clocks/regulators, and not to the full set of lets say clocks for a 
device node? Do we need that facility?


Alternative 2

Maintain arm,scmi-devid property for SCMI devices defined within scmi 
backend node.


// 1. Use phandle for a host device, to get device specific resources.

scmi-vio-backend {
      compatible = "arm,scmi-vio-backend";

      devices {

         device@1 {
           arm,scmi-devid = 1;
           arm,scmi-devname = "USB";
           arm,scmi-basedev = <&usb_device>;
         };
      };
};

OR


// 2. Use phandles of specific clocks/regulators within SCMI device.

scmi-vio-backend {
      compatible = "arm,scmi-vio-backend";

      devices {

         device@1 {
           arm,scmi-devid = 1;
           arm,scmi-devname = "USB";
           clocks = <&clock_phandle ...>;
           *-supply = <&regulator_phandle>;
         };
      };
};

OR

// 3. Use SCMI protocol specific clock and voltage IDs  in SCMI device.

scmi-vio-backend {
      compatible = "arm,scmi-vio-backend";

      devices {

         device@1 {
           arm,scmi-devid = 1;
           arm,scmi-devname = "USB";
           arm,scmi-clock-ids = <clock_id1 clock_id2 ...>;
           arm,scmi-voltage-ids = <voltage_id1 voltage_id2 ...>;
         };
      };
};


2. Resource discovery and policy management within VMM/TrustedAgent

a. VMM/TrustedAgent assigns agent ID to a VM using
    SCMI_ASSIGN_AGENT_INFO ioctl to SCMI vhost. Same ID and name mapping
    is returned by BASE_DISCOVER_AGENT SCMI message.

b. VMM/TrustedAgent does SCMI_GET_DEVICE_ATTRIBUTES ioctl to get the
    # of devices.

c. VMM/TrustedAgent does SCMI_GET_DEVICES ioctl to get the list of
    all device IDs.

d. VMM/TrustedAgent does SCMI_GET_DEVICE_INFO to get the name for a
    device ID.

e. VMM/TrustedAgent does BASE_SET_DEVICE_PERMISSIONS using ioctl to
    allow/revoke permissions for an agent id (which maps to a VM), for
    a device. VMM/TrustedAgent would need to maintain information
    about which device IDs a VM is allowed to access. These policies
    could be platform specific.


Thanks
Neeraj

[1] 
https://lore.kernel.org/lkml/cover.1645460043.git.oleksii_moisieiev@epam.com/

>> a. Device A             This is the client kernel driver in guest VM,
>>                          for ex. diplay driver, which uses standard
>>                          clock framework APIs to vote for a clock.
>>
>> b. Clock Framework      Underlying kernel clock framework on
>>                          guest.
>>
>> c. SCMI Clock           SCMI interface based clock driver.
>>
>> d. SCMI Virtio          Underlying SCMI framework, using Virtio as
>>                          transport driver.
>>
>> e. Virtio Infra         Virtio drivers on guest VM. These drivers
>>                          initiate virtqueue requests over Virtio
>>                          transport (MMIO/PCI), and forwards response
>>                          to SCMI Virtio registered callbacks.
>>
>> f. Hypervisor           Hosted Hypervisor (KVM for ex.), which traps
>>                          and forwards requests on virtqueue ring
>>                          buffers to the VMM.
>>
>> g. VMM                  Virtual Machine Monitor, running on host userspace,
>>                          which manages the lifecycle of guest VMs, and forwards
>>                          guest initiated virtqueue requests as IOCTLs to the
>>                          Vhost driver on host.
>>
>> h. SCMI Vhost           In kernel driver, which handles SCMI virtqueue
>>                          requests from guest VMs. This driver forwards the
>>                          requests to SCMI Virtio backend driver, and returns
>>                          the response from backend, over the virtqueue ring
>>                          buffers.
>>
>> i. SCMI Virtio Backend  SCMI backend, which handles the incoming SCMI messages
>>                          from SCMI Vhost driver, and forwards them to the
>>                          backend protocols like clock and voltage protocols.
>>                          The backend protocols uses the host apis for those
>>                          resources like clock APIs provided by clock framework,
>>                          to vote/request for the resource. The response from
>>                          the host api is parceled into a SCMI response message,
>>                          and is returned to the SCMI Vhost driver. The SCMI
>>                          Vhost driver in turn, returns the reponse over the
>>                          Virtqueue reponse buffers.
>>
> 
> Last but not least, this SCMI Virtio Backend layer in charge of
> processing incoming SCMI packets, interfacing with the Linux subsystems
> final backend and building SCMI replies from Linux will introduce a
> certain level of code/funcs duplication given that this same SCMI basic
> processing capabilities have been already baked in the SCMI stacks found in
> SCP and in TF-A (.. and maybe a few other other proprietary backends)...
> 
> ... but this is something maybe to be addressed in general in a
> different context not something that can be addressed by this series.
> 
> Sorry for the usual flood of words :P ... I'll have a more in deep
> review of the series in the next days, for now I wanted just to share my
> concerns and (maybe wrong) understanding and see what you or Sudeep and
> Souvik think about.
> 
> Thanks,
> Cristian
>