[RFC] KVM: Introduce KVM VIRTIO device

Introduce a new KVM device that represents a virtio device in order to
allow user to indicate device hardware's noncoherent DMA status.

Motivation
===
A virtio GPU device may want to configure GPU hardware to work in
noncoherent mode, i.e. some of its DMAs do not snoop CPU caches.
This is generally for performance consideration.
In certain platform, GFX performance can improve 20+% with DMAs going to
noncoherent path.

This noncoherent DMA mode works in below sequence:
1. Host backend driver programs hardware not to snoop memory of target
   DMA buffer.
2. Host backend driver indicates guest frontend driver to program guest PAT
   to WC for target DMA buffer.
3. Guest frontend driver writes to the DMA buffer without clflush stuffs.
4. Hardware does noncoherent DMA to the target buffer.

In this noncoherent DMA mode, both guest and hardware regard a DMA buffer
as not cached. So, if KVM forces the effective memory type of this DMA
buffer to be WB, hardware DMA may read incorrect data and cause misc
failures.

Therefore, we introduce a new KVM virtio device to let KVM be aware that
the virtio device hardware is working in noncoherent mode.
Then, KVM will honor guest PAT type in Intel's platform.

For a virtio device model (e.g. QEMU), if it knows device hardware is to be
configured to work in

a. noncoherent mode,
   - on device realization,
     it can create a KVM virtio device and set device attr to increase KVM
     noncoherent DMA count;
   - on device unrealization,
     destroy the KVM virtio device to decrease KVM noncoherent DMA count.

b. coherent mode,
   just do nothing.

Security
===
The biggest concern for KVM to honor guest's memory type in Intel platform
is page aliasing issue.
- For host MMIO, it's not a concern as KVM VMX programs EPT memory type to
  UC (which will overwrite all guest PAT type except WC) no matter guest
  memory type is honored or not.

- For host non-MMIO pages,
  * virtio guest frontend and host backend driver should be synced to use
    the same memory type to map a buffer. Otherwise, there will be
    potential problem for incorrect memory data. But this will only impact
    the buggy guest alone.
  * for live migration,
    as QEMU will read all guest memory during live migration, page aliasing
    could happen.
    Current thinking is to disable live migration if a virtio device has
    indicated its noncoherent state.
    As a follow-up, we can discuss other solutions. e.g.
    (a) switching back to coherent path before starting live migration.
    (b) read/write of guest memory with clflush during live migration.

Implementation Consideration
===
There is a previous series [1] from google to serve the same purpose to
let KVM be aware of virtio GPU's noncoherent DMA status. That series
requires a new memslot flag, and special memslots in user space.

We don't choose to use memslot flag to request honoring guest memory type.
Instead we hope to make the honoring request to be explicit (not tied to a
memslot flag). This is because once guest memory type is honored, not only
memory used by guest virtio device, but all guest memory is facing page
aliasing issue potentially. KVM needs a generic solution to take care of
page aliasing issue rather than counting on memory type of a special
memslot being aligned in host and guest.
(we can discuss what a generic solution to handle page aliasing issue will
look like in later follow-up series).

On the other hand, we choose to introduce a KVM virtio device rather than
just provide an ioctl to wrap kvm_arch_[un]register_noncoherent_dma()
directly, which is based on considerations that
1. Explicitly limit the "register noncoherent DMA" ability to virtio
   devices.
2. Provide a better encapsulation.
   Repeated setting noncoherent attribute to a KVM virtio device will only
   increase KVM noncoherent DMA count for once.
   KVM noncohrent DMA count will be decreased automatically when KVM virtio
   device is closed.
3. The KVM virtio device can be extended to let KVM know more info about
   the device to introduce non-coherent DMA.

Example QEMU usage
===
- on virtio device realize:
   struct kvm_create_device cd = {
       .type = KVM_DEV_TYPE_VIRTIO,
   };
   kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);

   struct kvm_device_attr attr = {
       .group = KVM_DEV_VIRTIO_NONCOHERENT,
       .attr = KVM_DEV_VIRTIO_NONCOHERENT_SET,
   };

   ioctl(cd.fd, KVM_SET_DEVICE_ATTR, &attr);

   g->kvm_device_fd = cd.fd;

- on virtio device unrealize
  close(g->kvm_device_fd);

Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: https://patchwork.kernel.org/project/dri-devel/cover/20200213213036.207625-1-olvaffe@gmail.com/ [1]
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/Kconfig     |   1 +
 include/uapi/linux/kvm.h |   5 ++
 virt/kvm/Kconfig         |   3 +
 virt/kvm/Makefile.kvm    |   1 +
 virt/kvm/kvm_main.c      |   8 +++
 virt/kvm/virtio.c        | 121 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/virtio.h        |  18 ++++++
 7 files changed, 157 insertions(+)
 create mode 100644 virt/kvm/virtio.c
 create mode 100644 virt/kvm/virtio.h

base-commit: 8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1

Message ID	20231214103520.7198-1-yan.y.zhao@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CC259C4332F for <dri-devel@archiver.kernel.org>; Thu, 14 Dec 2023 11:04:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D194510E092; Thu, 14 Dec 2023 11:04:44 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9FFBE10E092 for <dri-devel@lists.freedesktop.org>; Thu, 14 Dec 2023 11:04:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702551883; x=1734087883; h=from:to:cc:subject:date:message-id; bh=PPJ0qo+v4gU51NZ6QR1Rk2xhM3fO+C3owLqS4ypDdH8=; b=P7DrjrDlccdgzyGxKAYA6wtrd7xo1xXYGpsL2ZgeWrA23jwC5KFa2Zi4 +AXaZSXzbtC4QQW7h9ZSP0DkjmweU9vhDacREMtB54Re9Kgv0uFXYCOHS S9Rez4M5HH8fYiNYjsjjF7qgRk2yfbrqy3FbPe+2C7HsKPb1jaJAnlWvl yuZFMki2/s3wxxIQlIs3RnHp022jec6b8OTO2tfB2R9SwEq+nnI4T8v1Z tOPCCcVOq5AirnrK9a5ajO0LOSO2qlWu2Byew1weV0HI88pto3FA9v6Ip TxGZqoKHvX2QiA0SIL3jIDSIooTTHJi76wrODzggkVrHcMs4Vh8cCZRgz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10923"; a="8500545" X-IronPort-AV: E=Sophos;i="6.04,275,1695711600"; d="scan'208";a="8500545" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2023 03:04:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,275,1695711600"; d="scan'208";a="22358938" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Dec 2023 03:04:37 -0800 From: Yan Zhao <yan.y.zhao@intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [RFC PATCH] KVM: Introduce KVM VIRTIO device Date: Thu, 14 Dec 2023 18:35:20 +0800 Message-Id: <20231214103520.7198-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development <dri-devel.lists.freedesktop.org> List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>, <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe> List-Archive: <https://lists.freedesktop.org/archives/dri-devel> List-Post: <mailto:dri-devel@lists.freedesktop.org> List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help> List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>, <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe> Cc: kevin.tian@intel.com, wanpengli@tencent.com, zhenyu.z.wang@intel.com, seanjc@google.com, joro@8bytes.org, gurchetansingh@chromium.org, Yan Zhao <yan.y.zhao@intel.com>, kraxel@redhat.com, yongwei.ma@intel.com, zhiyuan.lv@intel.com, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	[RFC] KVM: Introduce KVM VIRTIO device \| expand [RFC] KVM: Introduce KVM VIRTIO device

[RFC] KVM: Introduce KVM VIRTIO device

Commit Message

Comments

Patch