diff mbox

[v1] vhost-pci-net: design vhost-pci-net for the transmission of network packets between VMs

Message ID 1477292966-144175-1-git-send-email-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wang, Wei W Oct. 24, 2016, 7:09 a.m. UTC
Here lists the issues that we discussed before, and solutions are
provided below.

1. Interrupt support
1) vhost-pci-net
According to the virtio spec, "The device MUST present at least one
notification capability", which implies that a virtio device can have
multiple notification capabilities.

Therefore, a second notifiaction capability
(VIRTIO_PCI_CAP_PEER_NOTIFY_CFG) will be implemented for the
vhost-pci device to notify its peer device (virtio-net) virtq. The
fd used for the ioeventfd of a notification address shares the fd used
for the irqfd of the corresponging peer virtq. So, when the vhost-pci
driver writes to that notification address (i.e. kick the peer virtq),
the related virtq interrupt will be injected to the virtio-net driver.

2) virtio-net
The virtio-net device also needs to notify the peer on the other end
(vhost-pci-net). The ioeventfd corresponding to the TX virtq uses
the same fd as that of the peer RX virtq's irqfd. This is all done
with only one notification capability.

2. How to handle guest crash?
1) vhost-pci-net guset crash
Typically, the guest will be killed and re-created by the admin. The
admin then uses the qemu monitor or qmp to operate on the virtio-net
guest: "device_del" the old virtio-net device, and "device_add" a new
virtio-net which connects to the new booted vhost-pci-net guest.

2) virtio-net guset crash
When the virtio-net guest is killed by the admin, the server will be
notified due to "close(fd)", which gives it a chance to destroy the
related vhost-pci-net device. When the guest is re-created and boots,
a new vhost-pci device will be created and initialized for it.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 content.tex | 240 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 238 insertions(+), 2 deletions(-)

Comments

Wang, Wei W Nov. 2, 2016, 5:45 a.m. UTC | #1
On Monday, October 24, 2016 3:09 PM, Wei Wang wrote:
> To: virtio-comment@lists.oasis-open.org; qemu-devel@nongnu.org;
> mst@redhat.com; marcandre.lureau@gmail.com; stefanha@redhat.com;
> pbonzini@redhat.com
> Cc: Wang, Wei W <wei.w.wang@intel.com>
> Subject: [PATCH v1] vhost-pci-net: design vhost-pci-net for the transmission of
> network packets between VMs


> +\subsection{Device configuration layout}\label{sec:Device Types /
> +Vhost-pci Device / Device configuration layout}
> +  None currently defined.

I thought about it more - I think it would be better to define a device specific config field, "rxq_num " , so that the driver can allocate and initialize the num of rx virtqs in the initialization (i.e. the probe() )  when the controlq pair hasn't been ready to be used for the iteration between the device and driver. 

Best,
Wei
diff mbox

Patch

diff --git a/content.tex b/content.tex
index 222b78e..4f67e94 100644
--- a/content.tex
+++ b/content.tex
@@ -3115,6 +3115,10 @@  features.
 
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
+
+\item[VIRTIO_NET_F_PEER_CONNECTION(24)] Device supports connection to
+    a peer device.
+
 \end{description}
 
 \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
@@ -3138,6 +3142,7 @@  Some networking feature bits require other networking feature bits
 \item[VIRTIO_NET_F_GUEST_ANNOUNCE] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_MQ] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_PEER_CONNECTION] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_STATUS.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3154,13 +3159,14 @@  were needed.
 
 Three driver-read-only configuration fields are currently defined. The \field{mac} address field
 always exists (though is only valid if VIRTIO_NET_F_MAC is set), and
-\field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
+\field{status} only exists if VIRTIO_NET_F_STATUS is set. Three
 read-only bits (for the driver) are currently defined for the status field:
-VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
+VIRTIO_NET_S_LINK_UP, VIRTIO_NET_S_ANNOUNCE and VIRTIO_NET_S_PEER_CONNECTION.
 
 \begin{lstlisting}
 #define VIRTIO_NET_S_LINK_UP     1
 #define VIRTIO_NET_S_ANNOUNCE    2
+#define VIRTIO_NET_S_PEER_CONNECTION 4
 \end{lstlisting}
 
 The following driver-read-only field, \field{max_virtqueue_pairs} only exists if
@@ -3204,6 +3210,10 @@  level ethernet header length) size with \field{gso_type} NONE or ECN, and do
 so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
 negotiated.
 
+Before the device turns on or off the VIRTIO_NET_S_PEER_CONNECTION
+bit in \field{status}, it MUST sync up with the connected peer
+device and get an acknowledgement.
+
 \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
 
 A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
@@ -3226,6 +3236,11 @@  If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
 size exceeding the value of \field{mtu} (plus low level ethernet header length)
 with \field{gso_type} NONE or ECN.
 
+If the driver negotiates VIRTIO_NET_F_PEER_CONNECTION, it SHOULD NOT
+send packets to any transmitq when the VIRTIO_NET_S_PEER_CONNECTION
+bit is off in \field{status}. The driver MUST NOT unload until the
+VIRTIO_NET_S_PEER_CONNECTION bit is turned off.
+
 \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
 \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
 When using the legacy interface, transitional devices and drivers
@@ -4046,6 +4061,23 @@  MUST format \field{offloads}
 according to the native endian of the guest rather than
 (necessarily when not using the legacy interface) little-endian.
 
+\paragraph{Peer Connection}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Peer Connection}
+If the VIRTIO_NET_F_PEER_CONNECTION feature bit is negotiated, the
+driver can send a control command to the device to turn on or off
+the VIRTIO_NET_S_PEER_CONNECTION bit in \field{status}.
+
+\begin{lstlisting}
+ #define VIRTIO_NET_PEER_CONNECTION       6
+ #define VIRTIO_NET_PEER_CONNECTION_OFF   0
+ #define VIRTIO_NET_PEER_CONNECTION_ON    1
+\end{lstlisting}
+
+The VIRTIO_NET_PEER_CONNECTION_OFF command is used by the driver to
+request the device to turn off the VIRTIO_NET_S_PEER_CONNECTION bit
+in \field{status}.
+The VIRTIO_NET_PEER_CONNECTION_ON command is used by the driver to
+request the device to turn on the VIRTIO_NET_S_PEER_CONNECTION bit
+in \field{status}.
 
 \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 Types / Network Device / Legacy Interface: Framing Requirements}
@@ -5811,6 +5843,210 @@  descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{Vhost-pci Net Device}\label{sec:Device Types / Vhost-pci Net Device}
+
+The vhost-pci net device enables point-to-point transmission of
+network packets between two isolated address spaces (e.g. virtual
+machines). An instance of the vhost-pci net device transmits and
+grabs packets from its peer device, which is usually a virtio net
+device from another address space.
+
+\subsection{Device ID}\label{sec:Device Types / Vhost-pci Net Device / Device ID}
+  TBD
+
+\subsection{Virtqueues}\label{sec:Device Types / Vhost-pci Net Device / Virtqueues}
+
+\begin{description}
+\item[0] control receiveq
+\item[1] control transmitq
+\item[2] receiveq1
+\item[\ldots]
+\item[N+2] receiveqN
+\end{description}
+
+N equals to the number of peer device transmitqs.
+
+\subsection{Feature bits}\label{sec:Device Types / Vhost-pci Net Device / Feature bits}
+
+The device is created with the feature bits that have been negotiated
+with the peer device. If the driver only accepts a subset of the
+feature bits, the device needs to re-negotiate the subset of feature
+bits with the peer device, which may trigger a reset of the peer
+device.
+
+\begin{description}
+\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Virtio-net can receive TSOv4.
+
+\item[VIRTIO_NET_F_GUEST_TSO6 (8)] Virtio-net can receive TSOv6.
+
+\item[VIRTIO_NET_F_GUEST_ECN (9)] Virtio-net can receive TSO with ECN.
+
+\item[VIRTIO_NET_F_GUEST_UFO (10)] Virtio-net can receive UFO.
+
+\item[VIRTIO_NET_F_HOST_TSO4 (11)] Vhost-pci-net supports TSOv4.
+
+\item[VIRTIO_NET_F_HOST_TSO6 (12)] Vhost-pci-net supports TSOv6.
+
+\item[VIRTIO_NET_F_HOST_ECN (13)] Vhost-pci-net supports TSO with ECN.
+
+\item[VIRTIO_NET_F_HOST_UFO (14)] Vhost-pci-net supports UFO.
+
+\item[VIRTIO_NET_F_MRG_RXBUF (15)] Virtio-net can merge receive buffers.
+
+\item[VIRTIO_NET_F_PEER_CONNECTION (24)] Virtio-net supports connection
+    to a peer device.
+
+\item[VHOST_F_LOG_ALL (26)] Vhost-pci-net supports dirty page logging.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / Vhost-pci Device / Device configuration layout}
+  None currently defined.
+
+\subsection{Device Initialization}\label{sec:Device Types / Vhost-pci Device / Device Initialization}
+
+The driver would perform a typical initialization routine like so:
+
+\begin{enumerate}
+\item Identify and intialize the control receiveq, control transmitq.
+
+\item Fill the control receiveq with buffers.
+
+\item Generate a random local MAC address.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Vhost-pci Net Device / Device Operation}
+
+\subsubsection{Control Virtqueue}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue}
+
+The pair of control virtqueues are used to exchange configuration
+messages between the device and driver. All the configuration
+messages are constructed using the following structure form:
+
+\begin{lstlisting}
+struct vhost_pci_ctrl {
+        u8  class;
+        u8  command;
+        u8  command-specific-data[];
+};
+\end{lstlisting}
+
+\paragraph{Peer Connection}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Connection}
+\begin{lstlisting}
+#define VHOST_PCI_CTRL_PEER_CONNECTION 0
+#define VHOST_PCI_CTRL_PEER_CONNECTION_OFF 0
+#define VHOST_PCI_CTRL_PEER_CONNECTION_ON 1
+\end{lstlisting}
+
+The device maintains the status of connection to the peer
+device. The VHOST_PCI_CTRL_PEER_CONNECTION_OFF and
+VHOST_PCI_CTRL_PEER_CONNECTION_ON commands are used by the
+device to send the status update to the driver through the
+control receiveq.
+
+The VHOST_PCI_CTRL_PEER_CONNECTION_OFF command is also used by the
+driver to request the device through the control transmitq to
+disconnect to the peer device.
+
+\devicenormative{\subparagraph}{Peer Connection}{Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Connection}
+
+When receiving a VHOST_PCI_CTRL_PEER_CONNECTION_OFF command from
+the driver, the device SHOULD sync up with the peer device for
+disconnection. Upon receiving an acknowledge of disconnection
+from the peer device, it SHOULD update the driver with a
+VHOST_PCI_CTRL_PEER_CONNECTION_OFF command.
+
+\drivernormative{\subparagraph}{Peer Connection}{Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Connection}
+
+The driver SHOULD NOT access any peer device memory before it is
+upadted with a VHOST_PCI_CTRL_PEER_CONNECTION_ON command.
+
+The driver SHOULD NOT access any peer device memory after it is
+upadted with a VHOST_PCI_CTRL_PEER_CONNECTION_OFF command.
+
+The driver MUST NOT unload until it is updated by the device with
+a VHOST_PCI_CTRL_PEER_CONNECTION_OFF command.
+
+\paragraph{Peer Memory Info}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Memory Info}
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_mem_info {
+        u64 peer_mem;
+        u8 other-mem-info[];
+}
+#define VHOST_PCI_CTRL_PEER_MEM_INFO 1
+#define VHOST_PCI_CTRL_PEER_MEM_INFO_BAR 0
+#define VHOST_PCI_CTRL_PEER_MEM_INFO_GVA 1
+\end{lstlisting}
+
+The device obtains the memory info from the peer device, and sends it
+to the driver.
+
+For command VHOST_PCI_CTRL_PEER_MEM_INFO_BAR, \field{peer_mem} stores
+the id of the BAR that holds the peer memory.
+
+For command VHOST_PCI_CTRL_PEER_MEM_INFO_GVA, \field{peer_mem} stores
+the virtual address that maps to the start of the peer memory. The
+driver can directly use this address to access the peer memory.
+
+The \field{other-mem-info} stores other peer memory info for the
+driver to reference, and it is defined according to the
+implementation's need.
+
+\drivernormative{\subparagraph}{Peer Memory Info}{Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Memory Info}
+If the driver receives a VHOST_PCI_CTRL_PEER_MEM_INFO_BAR command,
+it SHOULD map the peer memory through the bar specified in
+\field{peer_mem}. The address that maps to the start of the peer
+memory SHOULD be sent to the device using a
+VHOST_PCI_CTRL_PEER_MEM_INFO_GVA command through the
+control transmitq.
+
+\devicenormative{\subparagraph}{Peer Memory Info}{Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Memory Info}
+The device SHOULD send a VHOST_PCI_CTRL_PEER_MEM_INFO_GVA command to
+the driver if it internally records a virtual address of the peer
+memory. Otherwise, it should send a VHOST_PCI_CTRL_PEER_MEM_INFO_BAR
+command to the driver.
+
+\paragraph{Peer Virtq Info}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Virtq Info}
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_peer_virtq_info {
+        u32 virtq_num;
+        struct virtq peer_vq[];
+}
+#define VHOST_PCI_CTRL_PEER_VIRTQ_INFO 2
+\end{lstlisting}
+
+The device obtains the virtq info from the peer device, and sends
+it to the driver. The \field{virtq_num} stores the total number
+of virtqs.
+
+\drivernormative{\subparagraph}{Peer Virtq Info}{Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Peer Virtq Info}
+The driver SHOULD allocate (\field{virtq_num} / 2) of receiveqs.
+The driver SHOULD share the \field{virtq_num} of \field{peer_vq[]}, and use the peer transmitq as the shared receiveq, the peer receiveq as the shared transmitq.
+
+\paragraph{Dirty Page Logging}\label{sec:Device Types / Vhost-pci Net Device / Device Operation / Control Virtqueue / Dirty Page Logging}
+
+\begin{lstlisting}
+struct vhost_pci_ctrl_dirty_page_log_base {
+        u64 gpa;
+        u64 size;
+}
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING 3
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_OFF 0
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_ON  1
+#define VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_SET_LOG_BASE 2
+\end{lstlisting}
+
+The device sends the VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_OFF or
+VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_ON command to the driver to turn
+off or on the dirty logging mode.
+
+The device sends the VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_SET_LOG_BASE
+command to the driver to set the dirty logging bitmap.
+The command-specific-data for
+VHOST_PCI_CTRL_DIRTY_PAGE_LOGGING_SET_LOG_BASE includes a 64-bit
+guest physical address and a 64-bit size of the bitmap.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined: