mbox series

[v7,0/2] System Generation ID driver and VMGENID backend

Message ID 1614156452-17311-1-git-send-email-acatan@amazon.com (mailing list archive)
Headers show
Series System Generation ID driver and VMGENID backend | expand

Message

Catangiu, Adrian Costin Feb. 24, 2021, 8:47 a.m. UTC
This feature is aimed at virtualized or containerized environments
where VM or container snapshotting duplicates memory state, which is a
challenge for applications that want to generate unique data such as
request IDs, UUIDs, and cryptographic nonces.

The patch set introduces a mechanism that provides a userspace
interface for applications and libraries to be made aware of uniqueness
breaking events such as VM or container snapshotting, and allow them to
react and adapt to such events.

Solving the uniqueness problem strongly enough for cryptographic
purposes requires a mechanism which can deterministically reseed
userspace PRNGs with new entropy at restore time. This mechanism must
also support the high-throughput and low-latency use-cases that led
programmers to pick a userspace PRNG in the first place; be usable by
both application code and libraries; allow transparent retrofitting
behind existing popular PRNG interfaces without changing application
code; it must be efficient, especially on snapshot restore; and be
simple enough for wide adoption.

The first patch in the set implements a device driver which exposes a
the /dev/sysgenid char device to userspace. Its associated filesystem
operations operations can be used to build a system level safe workflow
that guest software can follow to protect itself from negative system
snapshot effects.

The second patch in the set adds a VmGenId driver which makes use of
the ACPI vmgenid device to drive SysGenId and to reseed kernel entropy
following VM snapshots.

**Please note**, SysGenID alone does not guarantee complete snapshot
safety to applications using it. A certain workflow needs to be
followed at the system level, in order to make the system
snapshot-resilient. Please see the "Snapshot Safety Prerequisites"
section in the included SysGenID documentation.

---

v6 -> v7:
  - remove sysgenid uevent

v5 -> v6:

  - sysgenid: watcher tracking disabled by default
  - sysgenid: add SYSGENID_SET_WATCHER_TRACKING ioctl to allow each
    file descriptor to set whether they should be tracked as watchers
  - rename SYSGENID_FORCE_GEN_UPDATE -> SYSGENID_TRIGGER_GEN_UPDATE
  - rework all documentation to clearly capture all prerequisites for
    achieving snapshot safety when using the provided mechanism
  - sysgenid documentation: replace individual filesystem operations
    examples with a higher level example showcasing system-level
    snapshot-safe workflow

v4 -> v5:

  - sysgenid: generation changes are also exported through uevents
  - remove SYSGENID_GET_OUTDATED_WATCHERS ioctl
  - document sysgenid ioctl major/minor numbers

v3 -> v4:

  - split functionality in two separate kernel modules: 
    1. drivers/misc/sysgenid.c which provides the generic userspace
       interface and mechanisms
    2. drivers/virt/vmgenid.c as VMGENID acpi device driver that seeds
       kernel entropy and acts as a driving backend for the generic
       sysgenid
  - rename /dev/vmgenid -> /dev/sysgenid
  - rename uapi header file vmgenid.h -> sysgenid.h
  - rename ioctls VMGENID_* -> SYSGENID_*
  - add ‘min_gen’ parameter to SYSGENID_FORCE_GEN_UPDATE ioctl
  - fix races in documentation examples

v2 -> v3:

  - separate the core driver logic and interface, from the ACPI device.
    The ACPI vmgenid device is now one possible backend
  - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS
  - add locking to avoid races between fs ops handlers and hw irq
    driven generation updates
  - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is
    outdated or a generation change happens while waiting (thus making
    current caller outdated), the ioctl returns -EINTR to signal the
    user to handle event and retry. Fixes blocking on oneself
  - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by
    CAP_CHECKPOINT_RESTORE capability, through which software can force
    generation bump

v1 -> v2:

  - expose to userspace a monotonically increasing u32 Vm Gen Counter
    instead of the hw VmGen UUID
  - since the hw/hypervisor-provided 128-bit UUID is not public
    anymore, add it to the kernel RNG as device randomness
  - insert driver page containing Vm Gen Counter in the user vma in
    the driver's mmap handler instead of using a fault handler
  - turn driver into a misc device driver to auto-create /dev/vmgenid
  - change ioctl arg to avoid leaking kernel structs to userspace
  - update documentation

Adrian Catangiu (2):
  drivers/misc: sysgenid: add system generation id driver
  drivers/virt: vmgenid: add vm generation id driver

 Documentation/misc-devices/sysgenid.rst            | 229 +++++++++++++++
 Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
 Documentation/virt/vmgenid.rst                     |  36 +++
 MAINTAINERS                                        |  15 +
 drivers/misc/Kconfig                               |  15 +
 drivers/misc/Makefile                              |   1 +
 drivers/misc/sysgenid.c                            | 322 +++++++++++++++++++++
 drivers/virt/Kconfig                               |  13 +
 drivers/virt/Makefile                              |   1 +
 drivers/virt/vmgenid.c                             | 153 ++++++++++
 include/uapi/linux/sysgenid.h                      |  18 ++
 11 files changed, 804 insertions(+)
 create mode 100644 Documentation/misc-devices/sysgenid.rst
 create mode 100644 Documentation/virt/vmgenid.rst
 create mode 100644 drivers/misc/sysgenid.c
 create mode 100644 drivers/virt/vmgenid.c
 create mode 100644 include/uapi/linux/sysgenid.h

Comments

Michael S. Tsirkin Feb. 24, 2021, 9:05 a.m. UTC | #1
On Wed, Feb 24, 2021 at 10:47:30AM +0200, Adrian Catangiu wrote:
> This feature is aimed at virtualized or containerized environments
> where VM or container snapshotting duplicates memory state, which is a
> challenge for applications that want to generate unique data such as
> request IDs, UUIDs, and cryptographic nonces.
> 
> The patch set introduces a mechanism that provides a userspace
> interface for applications and libraries to be made aware of uniqueness
> breaking events such as VM or container snapshotting, and allow them to
> react and adapt to such events.
> 
> Solving the uniqueness problem strongly enough for cryptographic
> purposes requires a mechanism which can deterministically reseed
> userspace PRNGs with new entropy at restore time. This mechanism must
> also support the high-throughput and low-latency use-cases that led
> programmers to pick a userspace PRNG in the first place; be usable by
> both application code and libraries; allow transparent retrofitting
> behind existing popular PRNG interfaces without changing application
> code; it must be efficient, especially on snapshot restore; and be
> simple enough for wide adoption.
> 
> The first patch in the set implements a device driver which exposes a
> the /dev/sysgenid char device to userspace. Its associated filesystem
> operations operations can be used to build a system level safe workflow
> that guest software can follow to protect itself from negative system
> snapshot effects.
> 
> The second patch in the set adds a VmGenId driver which makes use of
> the ACPI vmgenid device to drive SysGenId and to reseed kernel entropy
> following VM snapshots.
> 
> **Please note**, SysGenID alone does not guarantee complete snapshot
> safety to applications using it. A certain workflow needs to be
> followed at the system level, in order to make the system
> snapshot-resilient. Please see the "Snapshot Safety Prerequisites"
> section in the included SysGenID documentation.
> 
> ---
> 
> v6 -> v7:
>   - remove sysgenid uevent

How about we drop mmap too?

There's simply no way I can see to make it safe, and
no implementation is worse than a racy one imho.

Yea there's some decumentation explaining how it is not
supposed to be used but it will *seem* to work for people
and we will be stuck trying to maintain it.

Let's see if userspace using this often enough to make the
system call 



> v5 -> v6:
> 
>   - sysgenid: watcher tracking disabled by default
>   - sysgenid: add SYSGENID_SET_WATCHER_TRACKING ioctl to allow each
>     file descriptor to set whether they should be tracked as watchers
>   - rename SYSGENID_FORCE_GEN_UPDATE -> SYSGENID_TRIGGER_GEN_UPDATE
>   - rework all documentation to clearly capture all prerequisites for
>     achieving snapshot safety when using the provided mechanism
>   - sysgenid documentation: replace individual filesystem operations
>     examples with a higher level example showcasing system-level
>     snapshot-safe workflow
> 
> v4 -> v5:
> 
>   - sysgenid: generation changes are also exported through uevents
>   - remove SYSGENID_GET_OUTDATED_WATCHERS ioctl
>   - document sysgenid ioctl major/minor numbers
> 
> v3 -> v4:
> 
>   - split functionality in two separate kernel modules: 
>     1. drivers/misc/sysgenid.c which provides the generic userspace
>        interface and mechanisms
>     2. drivers/virt/vmgenid.c as VMGENID acpi device driver that seeds
>        kernel entropy and acts as a driving backend for the generic
>        sysgenid
>   - rename /dev/vmgenid -> /dev/sysgenid
>   - rename uapi header file vmgenid.h -> sysgenid.h
>   - rename ioctls VMGENID_* -> SYSGENID_*
>   - add ‘min_gen’ parameter to SYSGENID_FORCE_GEN_UPDATE ioctl
>   - fix races in documentation examples
> 
> v2 -> v3:
> 
>   - separate the core driver logic and interface, from the ACPI device.
>     The ACPI vmgenid device is now one possible backend
>   - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS
>   - add locking to avoid races between fs ops handlers and hw irq
>     driven generation updates
>   - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is
>     outdated or a generation change happens while waiting (thus making
>     current caller outdated), the ioctl returns -EINTR to signal the
>     user to handle event and retry. Fixes blocking on oneself
>   - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by
>     CAP_CHECKPOINT_RESTORE capability, through which software can force
>     generation bump
> 
> v1 -> v2:
> 
>   - expose to userspace a monotonically increasing u32 Vm Gen Counter
>     instead of the hw VmGen UUID
>   - since the hw/hypervisor-provided 128-bit UUID is not public
>     anymore, add it to the kernel RNG as device randomness
>   - insert driver page containing Vm Gen Counter in the user vma in
>     the driver's mmap handler instead of using a fault handler
>   - turn driver into a misc device driver to auto-create /dev/vmgenid
>   - change ioctl arg to avoid leaking kernel structs to userspace
>   - update documentation
> 
> Adrian Catangiu (2):
>   drivers/misc: sysgenid: add system generation id driver
>   drivers/virt: vmgenid: add vm generation id driver
> 
>  Documentation/misc-devices/sysgenid.rst            | 229 +++++++++++++++
>  Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
>  Documentation/virt/vmgenid.rst                     |  36 +++
>  MAINTAINERS                                        |  15 +
>  drivers/misc/Kconfig                               |  15 +
>  drivers/misc/Makefile                              |   1 +
>  drivers/misc/sysgenid.c                            | 322 +++++++++++++++++++++
>  drivers/virt/Kconfig                               |  13 +
>  drivers/virt/Makefile                              |   1 +
>  drivers/virt/vmgenid.c                             | 153 ++++++++++
>  include/uapi/linux/sysgenid.h                      |  18 ++
>  11 files changed, 804 insertions(+)
>  create mode 100644 Documentation/misc-devices/sysgenid.rst
>  create mode 100644 Documentation/virt/vmgenid.rst
>  create mode 100644 drivers/misc/sysgenid.c
>  create mode 100644 drivers/virt/vmgenid.c
>  create mode 100644 include/uapi/linux/sysgenid.h
> 
> -- 
> 2.7.4
> 
> 
> 
> 
> Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.
Catangiu, Adrian Costin March 4, 2021, 8:08 p.m. UTC | #2
Hi Michael,

On 24/02/2021, 11:06, "Michael S. Tsirkin" <mst@redhat.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On Wed, Feb 24, 2021 at 10:47:30AM +0200, Adrian Catangiu wrote:
    > This feature is aimed at virtualized or containerized environments
    > where VM or container snapshotting duplicates memory state, which is a
    > challenge for applications that want to generate unique data such as
    > request IDs, UUIDs, and cryptographic nonces.
    >
    > The patch set introduces a mechanism that provides a userspace
    > interface for applications and libraries to be made aware of uniqueness
    > breaking events such as VM or container snapshotting, and allow them to
    > react and adapt to such events.
    >
    > Solving the uniqueness problem strongly enough for cryptographic
    > purposes requires a mechanism which can deterministically reseed
    > userspace PRNGs with new entropy at restore time. This mechanism must
    > also support the high-throughput and low-latency use-cases that led
    > programmers to pick a userspace PRNG in the first place; be usable by
    > both application code and libraries; allow transparent retrofitting
    > behind existing popular PRNG interfaces without changing application
    > code; it must be efficient, especially on snapshot restore; and be
    > simple enough for wide adoption.
    >
    > The first patch in the set implements a device driver which exposes a
    > the /dev/sysgenid char device to userspace. Its associated filesystem
    > operations operations can be used to build a system level safe workflow
    > that guest software can follow to protect itself from negative system
    > snapshot effects.
    >
    > The second patch in the set adds a VmGenId driver which makes use of
    > the ACPI vmgenid device to drive SysGenId and to reseed kernel entropy
    > following VM snapshots.
    >
    > **Please note**, SysGenID alone does not guarantee complete snapshot
    > safety to applications using it. A certain workflow needs to be
    > followed at the system level, in order to make the system
    > snapshot-resilient. Please see the "Snapshot Safety Prerequisites"
    > section in the included SysGenID documentation.
    >
    > ---
    >
    > v6 -> v7:
    >   - remove sysgenid uevent

    How about we drop mmap too?

    There's simply no way I can see to make it safe, and
    no implementation is worse than a racy one imho.

    Yea there's some decumentation explaining how it is not
    supposed to be used but it will *seem* to work for people
    and we will be stuck trying to maintain it.

    Let's see if userspace using this often enough to make the
    system call

As Colm explained in his reply, the mmap is the only option to consume
this within the strict latency constraints of PRNGs and SSL libs, so what if
instead, we remove the IRQ race by removing vmgenid as an in-kernel
sysgenid backend/driver?

We could just drop the vmgenid driver for now and only drive sysgenid
from userspace using the fs interface. Doing so will remove the IRQ race
which comes from vmgenid backend, and will keep the SysGenID kernel
interface safe and consistent, with a race-free mmap().

What do you think?

    > v5 -> v6:
    >
    >   - sysgenid: watcher tracking disabled by default
    >   - sysgenid: add SYSGENID_SET_WATCHER_TRACKING ioctl to allow each
    >     file descriptor to set whether they should be tracked as watchers
    >   - rename SYSGENID_FORCE_GEN_UPDATE -> SYSGENID_TRIGGER_GEN_UPDATE
    >   - rework all documentation to clearly capture all prerequisites for
    >     achieving snapshot safety when using the provided mechanism
    >   - sysgenid documentation: replace individual filesystem operations
    >     examples with a higher level example showcasing system-level
    >     snapshot-safe workflow
    >
    > v4 -> v5:
    >
    >   - sysgenid: generation changes are also exported through uevents
    >   - remove SYSGENID_GET_OUTDATED_WATCHERS ioctl
    >   - document sysgenid ioctl major/minor numbers
    >
    > v3 -> v4:
    >
    >   - split functionality in two separate kernel modules:
    >     1. drivers/misc/sysgenid.c which provides the generic userspace
    >        interface and mechanisms
    >     2. drivers/virt/vmgenid.c as VMGENID acpi device driver that seeds
    >        kernel entropy and acts as a driving backend for the generic
    >        sysgenid
    >   - rename /dev/vmgenid -> /dev/sysgenid
    >   - rename uapi header file vmgenid.h -> sysgenid.h
    >   - rename ioctls VMGENID_* -> SYSGENID_*
    >   - add ‘min_gen’ parameter to SYSGENID_FORCE_GEN_UPDATE ioctl
    >   - fix races in documentation examples
    >
    > v2 -> v3:
    >
    >   - separate the core driver logic and interface, from the ACPI device.
    >     The ACPI vmgenid device is now one possible backend
    >   - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS
    >   - add locking to avoid races between fs ops handlers and hw irq
    >     driven generation updates
    >   - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is
    >     outdated or a generation change happens while waiting (thus making
    >     current caller outdated), the ioctl returns -EINTR to signal the
    >     user to handle event and retry. Fixes blocking on oneself
    >   - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by
    >     CAP_CHECKPOINT_RESTORE capability, through which software can force
    >     generation bump
    >
    > v1 -> v2:
    >
    >   - expose to userspace a monotonically increasing u32 Vm Gen Counter
    >     instead of the hw VmGen UUID
    >   - since the hw/hypervisor-provided 128-bit UUID is not public
    >     anymore, add it to the kernel RNG as device randomness
    >   - insert driver page containing Vm Gen Counter in the user vma in
    >     the driver's mmap handler instead of using a fault handler
    >   - turn driver into a misc device driver to auto-create /dev/vmgenid
    >   - change ioctl arg to avoid leaking kernel structs to userspace
    >   - update documentation
    >
    > Adrian Catangiu (2):
    >   drivers/misc: sysgenid: add system generation id driver
    >   drivers/virt: vmgenid: add vm generation id driver
    >
    >  Documentation/misc-devices/sysgenid.rst            | 229 +++++++++++++++
    >  Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
    >  Documentation/virt/vmgenid.rst                     |  36 +++
    >  MAINTAINERS                                        |  15 +
    >  drivers/misc/Kconfig                               |  15 +
    >  drivers/misc/Makefile                              |   1 +
    >  drivers/misc/sysgenid.c                            | 322 +++++++++++++++++++++
    >  drivers/virt/Kconfig                               |  13 +
    >  drivers/virt/Makefile                              |   1 +
    >  drivers/virt/vmgenid.c                             | 153 ++++++++++
    >  include/uapi/linux/sysgenid.h                      |  18 ++
    >  11 files changed, 804 insertions(+)
    >  create mode 100644 Documentation/misc-devices/sysgenid.rst
    >  create mode 100644 Documentation/virt/vmgenid.rst
    >  create mode 100644 drivers/misc/sysgenid.c
    >  create mode 100644 drivers/virt/vmgenid.c
    >  create mode 100644 include/uapi/linux/sysgenid.h
    >
    > --
    > 2.7.4
    >
    >
    >
    >
    > Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.