diff mbox series

[RFC,07/21] pci/tdisp: Introduce tsm module

Message ID 20240823132137.336874-8-aik@amd.com (mailing list archive)
State New, archived
Headers show
Series Secure VFIO, TDISP, SEV TIO | expand

Commit Message

Alexey Kardashevskiy Aug. 23, 2024, 1:21 p.m. UTC
The module responsibilities are:
1. detect TEE support in a device and create nodes in the device's sysfs
entry;
2. allow binding a PCI device to a VM for passing it through in a trusted
manner;
3. store measurements/certificates/reports and provide access to those for
the userspace via sysfs.

This relies on the platform to register a set of callbacks,
for both host and guest.

And tdi_enabled in the device struct.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
 drivers/virt/coco/Makefile      |    1 +
 include/linux/device.h          |    5 +
 include/linux/tsm.h             |  263 ++++
 drivers/virt/coco/tsm.c         | 1336 ++++++++++++++++++++
 Documentation/virt/coco/tsm.rst |   62 +
 drivers/virt/coco/Kconfig       |   11 +
 6 files changed, 1678 insertions(+)

Comments

Jason Gunthorpe Aug. 27, 2024, 12:32 p.m. UTC | #1
On Fri, Aug 23, 2024 at 11:21:21PM +1000, Alexey Kardashevskiy wrote:
> The module responsibilities are:
> 1. detect TEE support in a device and create nodes in the device's sysfs
> entry;
> 2. allow binding a PCI device to a VM for passing it through in a trusted
> manner;

Binding devices to VMs and managing their lifecycle is the purvue of
VFIO and iommufd, it should not be exposed via weird sysfs calls like
this. You can't build the right security model without being inside
the VFIO context.

As I said in the other email, it seems like the PSP and the iommu
driver need to coordinate to ensure the two DTEs are consistent, and
solve the other sequencing problems you seem to have.

I'm not convinced this should be in some side module - it seems like
this is possibly more logically integrated as part of the iommu..

> +static ssize_t tsm_dev_connect_store(struct device *dev, struct device_attribute *attr,
> +				     const char *buf, size_t count)
> +{
> +	struct tsm_dev *tdev = tsm_dev_get(dev);
> +	unsigned long val;
> +	ssize_t ret = -EIO;
> +
> +	if (kstrtoul(buf, 0, &val) < 0)
> +		ret = -EINVAL;
> +	else if (val && !tdev->connected)
> +		ret = tsm_dev_connect(tdev, tsm.private_data, val);
> +	else if (!val && tdev->connected)
> +		ret = tsm_dev_reclaim(tdev, tsm.private_data);
> +
> +	if (!ret)
> +		ret = count;
> +
> +	tsm_dev_put(tdev);
> +
> +	return ret;
> +}
> +
> +static ssize_t tsm_dev_connect_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct tsm_dev *tdev = tsm_dev_get(dev);
> +	ssize_t ret = sysfs_emit(buf, "%u\n", tdev->connected);
> +
> +	tsm_dev_put(tdev);
> +	return ret;
> +}
> +
> +static DEVICE_ATTR_RW(tsm_dev_connect);

Please do a much better job explaining the uAPIS you are trying to
build in all the commit messages and how you expect them to be used.

Picking this stuff out of a 6k loc series is a bit tricky

Jason
Alexey Kardashevskiy Aug. 28, 2024, 3 a.m. UTC | #2
On 27/8/24 22:32, Jason Gunthorpe wrote:
> On Fri, Aug 23, 2024 at 11:21:21PM +1000, Alexey Kardashevskiy wrote:
>> The module responsibilities are:
>> 1. detect TEE support in a device and create nodes in the device's sysfs
>> entry;
>> 2. allow binding a PCI device to a VM for passing it through in a trusted
>> manner;
> 
> Binding devices to VMs and managing their lifecycle is the purvue of
> VFIO and iommufd, it should not be exposed via weird sysfs calls like
> this. You can't build the right security model without being inside
> the VFIO context.

Is "extend the MAP_DMA uAPI to accept {gmemfd, offset}" enough for the 
VFIO context, or there is more and I am missing it?

> As I said in the other email, it seems like the PSP and the iommu
> driver need to coordinate to ensure the two DTEs are consistent, and
> solve the other sequencing problems you seem to have.

Correct. That DTE/sDTE hack is rather for showing the bare minimum 
needed to get IDE+TDISP going, we will definitely address this better.

> I'm not convinced this should be in some side module - it seems like
> this is possibly more logically integrated as part of the iommu..

There are two things which the module's sysfs interface tries dealing with:

1) device authentication (by the PSP, contrary to Lukas'es host-based 
CMA) and PCIe link encryption (PCIe IDE keys only programmable via the PSP);

2) VFIO + Coco VM.

The first part does not touch VFIO or IOMMU, and the sysfs interface 
provides API mostly for 1).

The proposed sysfs interface does not do VFIO or IOMMUFD binding though 
as it is weird indeed, even for test/bringup purposes, the only somewhat 
useful sysfs bit here is the interface report (a PCI/TDISP thing, comes 
from a device).

Besides sysfs, the module provides common "verbs" to be defined by the 
platform (which is right now a reduced set of the AMD PSP operations but 
the hope is it can be generalized); and the module also does PCIe DOE 
bouncing (which is also not uncommon). Part of this exercise is trying 
to find some common ground (if it is possible), hence routing everything 
via this module.


>> +static ssize_t tsm_dev_connect_store(struct device *dev, struct device_attribute *attr,
>> +				     const char *buf, size_t count)
>> +{
>> +	struct tsm_dev *tdev = tsm_dev_get(dev);
>> +	unsigned long val;
>> +	ssize_t ret = -EIO;
>> +
>> +	if (kstrtoul(buf, 0, &val) < 0)
>> +		ret = -EINVAL;
>> +	else if (val && !tdev->connected)
>> +		ret = tsm_dev_connect(tdev, tsm.private_data, val);
>> +	else if (!val && tdev->connected)
>> +		ret = tsm_dev_reclaim(tdev, tsm.private_data);
>> +
>> +	if (!ret)
>> +		ret = count;
>> +
>> +	tsm_dev_put(tdev);
>> +
>> +	return ret;
>> +}
>> +
>> +static ssize_t tsm_dev_connect_show(struct device *dev, struct device_attribute *attr, char *buf)
>> +{
>> +	struct tsm_dev *tdev = tsm_dev_get(dev);
>> +	ssize_t ret = sysfs_emit(buf, "%u\n", tdev->connected);
>> +
>> +	tsm_dev_put(tdev);
>> +	return ret;
>> +}
>> +
>> +static DEVICE_ATTR_RW(tsm_dev_connect);
> 
> Please do a much better job explaining the uAPIS you are trying to
> build in all the commit messages and how you expect them to be used.

True and sorry about that, will do better...

> Picking this stuff out of a 6k loc series is a bit tricky

Thanks for the comments!

> 
> Jason
Jonathan Cameron Aug. 28, 2024, 3:04 p.m. UTC | #3
On Fri, 23 Aug 2024 23:21:21 +1000
Alexey Kardashevskiy <aik@amd.com> wrote:

> The module responsibilities are:
> 1. detect TEE support in a device and create nodes in the device's sysfs
> entry;
> 2. allow binding a PCI device to a VM for passing it through in a trusted
> manner;
> 3. store measurements/certificates/reports and provide access to those for
> the userspace via sysfs.
> 
> This relies on the platform to register a set of callbacks,
> for both host and guest.
> 
> And tdi_enabled in the device struct.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>



Main thing missing here is a sysfs ABI doc.

Otherwise, some random comments as I get my head around it.

Jonathan


> diff --git a/include/linux/tsm.h b/include/linux/tsm.h
> new file mode 100644
> index 000000000000..d48eceaf5bc0
> --- /dev/null
> +++ b/include/linux/tsm.h
> @@ -0,0 +1,263 @@
...


> +/* Physical device descriptor responsible for IDE/TDISP setup */
> +struct tsm_dev {
> +	struct kref kref;
> +	const struct attribute_group *ag;
> +	struct pci_dev *pdev; /* Physical PCI function #0 */
> +	struct tsm_spdm spdm;
> +	struct mutex spdm_mutex;

Kind of obvious, but still good to document what data this is protecting.

> +
> +	u8 tc_mask;
> +	u8 cert_slot;
> +	u8 connected;
> +	struct {
> +		u8 enabled:1;
> +		u8 enable:1;
> +		u8 def:1;
> +		u8 dev_ide_cfg:1;
> +		u8 dev_tee_limited:1;
> +		u8 rootport_ide_cfg:1;
> +		u8 rootport_tee_limited:1;
> +		u8 id;
> +	} selective_ide[256];
> +	bool ide_pre;
> +
> +	struct tsm_blob *meas;
> +	struct tsm_blob *certs;
> +
> +	void *data; /* Platform specific data */
> +};
> +

> diff --git a/drivers/virt/coco/tsm.c b/drivers/virt/coco/tsm.c
> new file mode 100644
> index 000000000000..e90455a0267f
> --- /dev/null
> +++ b/drivers/virt/coco/tsm.c
...

>
> +struct tsm_tdi *tsm_tdi_get(struct device *dev)
> +{
> +	struct tsm_tdi *tdi = dev->tdi;
> +
> +	return tdi;
	return dev->tdi;
seems fine to me.

> +}
> +EXPORT_SYMBOL_GPL(tsm_tdi_get);
> +
> +static int spdm_forward(struct tsm_spdm *spdm, u8 type)
> +{
> +	struct pci_doe_mb *doe_mb;
> +	int rc;
> +
> +	if (type == PCI_DOE_PROTOCOL_SECURED_CMA_SPDM)
> +		doe_mb = spdm->doe_mb_secured;
> +	else if (type == PCI_DOE_PROTOCOL_CMA_SPDM)
> +		doe_mb = spdm->doe_mb;
> +	else
> +		return -EINVAL;
> +
> +	if (!doe_mb)
> +		return -EFAULT;
> +
> +	rc = pci_doe(doe_mb, PCI_VENDOR_ID_PCI_SIG, type,
> +		     spdm->req, spdm->req_len, spdm->rsp, spdm->rsp_len);
> +	if (rc >= 0)
	if (rc < 0)
		return rc;

	spdm->rsp_len = rc;
	
	return 0;

> +		spdm->rsp_len = rc;
> +
> +	return rc;
> +}
> +

> +
> +static int tsm_ide_refresh(struct tsm_dev *tdev, void *private_data)
> +{
> +	int ret;
> +
> +	if (!tsm.ops->ide_refresh)
> +		return -EPERM;
> +
> +	mutex_lock(&tdev->spdm_mutex);
guard() and early returns.

> +	while (1) {
> +		ret = tsm.ops->ide_refresh(tdev, private_data);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdev->spdm_mutex);
> +
> +	return ret;
> +}
> +
> +static void tsm_tdi_reclaim(struct tsm_tdi *tdi, void *private_data)
> +{
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->tdi_reclaim))
> +		return;
> +
> +	mutex_lock(&tdi->tdev->spdm_mutex);
guard() and early returns.

> +	while (1) {
> +		ret = tsm.ops->tdi_reclaim(tdi, private_data);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdi->tdev->spdm_mutex);
> +}


> +static int tsm_tdi_status(struct tsm_tdi *tdi, void *private_data, struct tsm_tdi_status *ts)
> +{
> +	struct tsm_tdi_status tstmp = { 0 };
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->tdi_status))
> +		return -EPERM;
> +
> +	mutex_lock(&tdi->tdev->spdm_mutex);
Perhaps scoped_guard() if it doesn't make sense to set *ts on error
(see below).
> +	while (1) {
> +		ret = tsm.ops->tdi_status(tdi, private_data, &tstmp);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdi->tdev->spdm_mutex);
> +
> +	*ts = tstmp;
Want to set this even on error?
> +
> +	return ret;
> +}


> +static ssize_t tsm_meas_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct tsm_dev *tdev = tsm_dev_get(dev);
> +	ssize_t n = 0;
> +
> +	if (!tdev->meas) {
> +		n = sysfs_emit(buf, "none\n");
> +	} else {
> +		if (!n)
Always true.

> +			n = tsm_meas_gen(tdev->meas, buf, PAGE_SIZE);
> +		if (!n)
> +			n = blob_show(tdev->meas, buf);
> +	}
> +
> +	tsm_dev_put(tdev);
> +	return n;
> +}
> +
> +static DEVICE_ATTR_RO(tsm_meas);

> +
> +static struct attribute *host_dev_attrs[] = {
> +	&dev_attr_tsm_cert_slot.attr,
> +	&dev_attr_tsm_tc_mask.attr,
> +	&dev_attr_tsm_dev_connect.attr,
> +	&dev_attr_tsm_sel_stream.attr,
> +	&dev_attr_tsm_ide_refresh.attr,
> +	&dev_attr_tsm_certs.attr,
> +	&dev_attr_tsm_meas.attr,
> +	&dev_attr_tsm_dev_status.attr,
> +	NULL,
That final comma not needed as we'll never add anything after
this.

> +};
> +static const struct attribute_group host_dev_group = {
> +	.attrs = host_dev_attrs,
> +};


> +
> +static ssize_t tsm_report_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct tsm_tdi *tdi = tsm_tdi_get(dev);
> +	ssize_t n = 0;
> +
> +	if (!tdi->report) {
> +		n = sysfs_emit(buf, "none\n");
> +	} else {
> +		if (!n)

Always true.

> +			n = tsm_report_gen(tdi->report, buf, PAGE_SIZE);
> +		if (!n)
> +			n = blob_show(tdi->report, buf);
> +	}
> +
> +	return n;

in all cases this is only set once and if set nothing else
happens. Just return directly at those.

> +}

> +static char *spdm_algos_to_str(u64 algos, char *buf, size_t len)
> +{
> +	size_t n = 0;
> +
> +	buf[0] = 0;
> +#define __ALGO(x) do {								\
> +		if ((n < len) && (algos & (1ULL << (TSM_TDI_SPDM_ALGOS_##x))))	\
> +			n += snprintf(buf + n, len - n, #x" ");			\
> +	} while (0)

Odd wrapping. I'd push the do { onto the next line and it will all look more normal.

> +
> +	__ALGO(DHE_SECP256R1);
> +	__ALGO(DHE_SECP384R1);
> +	__ALGO(AEAD_AES_128_GCM);
> +	__ALGO(AEAD_AES_256_GCM);
> +	__ALGO(ASYM_TPM_ALG_RSASSA_3072);
> +	__ALGO(ASYM_TPM_ALG_ECDSA_ECC_NIST_P256);
> +	__ALGO(ASYM_TPM_ALG_ECDSA_ECC_NIST_P384);
> +	__ALGO(HASH_TPM_ALG_SHA_256);
> +	__ALGO(HASH_TPM_ALG_SHA_384);
> +	__ALGO(KEY_SCHED_SPDM_KEY_SCHEDULE);
> +#undef __ALGO
> +	return buf;
> +}

> +
> +static ssize_t tsm_tdi_status_show(struct device *dev, struct device_attribute *attr, char *buf)
wrap
> +{

> +
> +static int tsm_tdi_init(struct tsm_dev *tdev, struct pci_dev *pdev)
> +{
> +	struct tsm_tdi *tdi;
> +	int ret = 0;
set in all paths that use it.

> +
> +	dev_info(&pdev->dev, "Initializing tdi\n");
> +	if (!tdev)
> +		return -ENODEV;
Is this defense needed?  Seems overkill given we just
got the tdev in all paths that lead here.

> +
> +	tdi = kzalloc(sizeof(*tdi), GFP_KERNEL);
> +	if (!tdi)
> +		return -ENOMEM;
> +
> +	/* tsm_dev_get() requires pdev->dev.tdi which is set later */
> +	if (!kref_get_unless_zero(&tdev->kref)) {
> +		ret = -EPERM;
> +		goto free_exit;
> +	}
> +
> +	if (tsm.ops->dev_connect)
> +		tdi->ag = &host_tdi_group;
> +	else
> +		tdi->ag = &guest_tdi_group;
> +
> +	ret = sysfs_create_link(&pdev->dev.kobj, &tdev->pdev->dev.kobj, "tsm_dev");
> +	if (ret)
> +		goto free_exit;
> +
> +	ret = device_add_group(&pdev->dev, tdi->ag);
> +	if (ret)
> +		goto sysfs_unlink_exit;
> +
> +	tdi->tdev = tdev;
> +	tdi->pdev = pci_dev_get(pdev);
> +
> +	pdev->dev.tdi_enabled = !pdev->is_physfn || tsm.physfn;
> +	pdev->dev.tdi = tdi;

As below, __free() + pointer steal will be neater here.

> +	pci_info(pdev, "TDI enabled=%d\n", pdev->dev.tdi_enabled);
> +
> +	return 0;
> +
> +sysfs_unlink_exit:
> +	sysfs_remove_link(&pdev->dev.kobj, "tsm_dev");
> +free_exit:
> +	kfree(tdi);
> +
> +	return ret;
> +}

> +static int tsm_dev_init(struct pci_dev *pdev, struct tsm_dev **ptdev)
> +{
> +	struct tsm_dev *tdev;
> +	int ret = 0;
> +
> +	dev_info(&pdev->dev, "Initializing tdev\n");

dev_dbg() for non RFC versions.

> +	tdev = kzalloc(sizeof(*tdev), GFP_KERNEL);
> +	if (!tdev)
> +		return -ENOMEM;
> +
> +	kref_init(&tdev->kref);
> +	tdev->tc_mask = tsm.tc_mask;
> +	tdev->cert_slot = tsm.cert_slot;
> +	tdev->pdev = pci_dev_get(pdev);
> +	mutex_init(&tdev->spdm_mutex);
> +
> +	if (tsm.ops->dev_connect)
> +		tdev->ag = &host_dev_group;
> +	else
> +		tdev->ag = &guest_dev_group;
> +
> +	ret = device_add_group(&pdev->dev, tdev->ag);
> +	if (ret)
> +		goto free_exit;
> +
> +	if (tsm.ops->dev_connect) {
> +		ret = -EPERM;
> +		tdev->pdev = pci_dev_get(pdev);
> +		tdev->spdm.doe_mb = pci_find_doe_mailbox(tdev->pdev,
> +							 PCI_VENDOR_ID_PCI_SIG,
> +							 PCI_DOE_PROTOCOL_CMA_SPDM);
> +		if (!tdev->spdm.doe_mb)
> +			goto pci_dev_put_exit;
Group not released
> +
> +		tdev->spdm.doe_mb_secured = pci_find_doe_mailbox(tdev->pdev,
> +								 PCI_VENDOR_ID_PCI_SIG,
> +								 PCI_DOE_PROTOCOL_SECURED_CMA_SPDM);
Long lines.  I'd wrap after =

> +		if (!tdev->spdm.doe_mb_secured)
> +			goto pci_dev_put_exit;
nor here.

> +	}
> +
> +	*ptdev = tdev;

Could use __free() magic for tdev and steal the ptr here.
Maybe not worth the effort though given you need the error block anyway.


> +	return 0;
> +
> +pci_dev_put_exit:
> +	pci_dev_put(pdev);
> +free_exit:
> +	kfree(tdev);
> +
> +	return ret;
> +}
> +
> +static void tsm_dev_free(struct kref *kref)
> +{
> +	struct tsm_dev *tdev = container_of(kref, struct tsm_dev, kref);
> +
> +	device_remove_group(&tdev->pdev->dev, tdev->ag);
> +
> +	if (tdev->connected)
> +		tsm_dev_reclaim(tdev, tsm.private_data);
> +
> +	dev_info(&tdev->pdev->dev, "Freeing TDEV\n");

dev_dbg() eventually but fine for RFC.

> +	pci_dev_put(tdev->pdev);
> +	kfree(tdev);
> +}
> +
> +static int tsm_alloc_device(struct pci_dev *pdev)
> +{
> +	int ret = 0;
> +
> +	/* It is guest VM == TVM */
> +	if (!tsm.ops->dev_connect) {
> +		if (pdev->devcap & PCI_EXP_DEVCAP_TEE_IO) {
> +			struct tsm_dev *tdev = NULL;
> +
> +			ret = tsm_dev_init(pdev, &tdev);
> +			if (ret)
> +				return ret;
> +
> +			ret = tsm_tdi_init(tdev, pdev);
> +			tsm_dev_put(tdev);
> +			return ret;
> +		}
> +		return 0;
> +	}
> +
> +	if (pdev->is_physfn && (PCI_FUNC(pdev->devfn) == 0) &&
> +	    (pdev->devcap & PCI_EXP_DEVCAP_TEE_IO)) {
> +		struct tsm_dev *tdev = NULL;
> +
Trivial but... One blank line 
> +
> +		ret = tsm_dev_init(pdev, &tdev);
> +		if (ret)
> +			return ret;
> +
> +		ret = tsm_tdi_init(tdev, pdev);
> +		tsm_dev_put(tdev);
> +		return ret;
> +	}
> +
> +	if (pdev->is_virtfn) {
> +		struct pci_dev *pf0 = pci_get_slot(pdev->physfn->bus,
> +						   pdev->physfn->devfn & ~7);
> +
> +		if (pf0 && (pf0->devcap & PCI_EXP_DEVCAP_TEE_IO)) {
> +			struct tsm_dev *tdev = tsm_dev_get(&pf0->dev);
> +
> +			ret = tsm_tdi_init(tdev, pdev);
> +			tsm_dev_put(tdev);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}

> +
> +static struct notifier_block tsm_pci_bus_nb = {
> +	.notifier_call = tsm_pci_bus_notifier,
> +};
> +
> +static int __init tsm_init(void)
> +{
> +	int ret = 0;
> +
> +	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> +	return ret;
> +}
> +
> +static void __exit tsm_cleanup(void)
> +{
> +}

These aren't needed. If it's a library module it will have
nothing to do on init / exit which is fine.
And we don't care about versions! If we do then the discoverability
of features etc is totally broken.


> +
> +void tsm_set_ops(struct tsm_ops *ops, void *private_data)
> +{
> +	struct pci_dev *pdev = NULL;
> +	int ret;
> +
> +	if (!tsm.ops && ops) {
> +		tsm.ops = ops;
> +		tsm.private_data = private_data;
> +
> +		for_each_pci_dev(pdev) {
> +			ret = tsm_alloc_device(pdev);
> +			if (ret)
> +				break;
> +		}
> +		bus_register_notifier(&pci_bus_type, &tsm_pci_bus_nb);
> +	} else {
> +		bus_unregister_notifier(&pci_bus_type, &tsm_pci_bus_nb);
> +		for_each_pci_dev(pdev)
> +			tsm_dev_freeice(&pdev->dev);
> +		tsm.ops = ops;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(tsm_set_ops);
> +
> +int tsm_tdi_bind(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, u32 asid)
> +{
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->tdi_bind))
> +		return -EPERM;
> +
> +	tdi->guest_rid = guest_rid;
> +	tdi->vmid = vmid;
> +	tdi->asid = asid;
> +
> +	mutex_lock(&tdi->tdev->spdm_mutex);

scoped_guard() may help here and in the good path at least
allow a direct return.

> +	while (1) {
> +		ret = tsm.ops->tdi_bind(tdi, guest_rid, vmid, asid, tsm.private_data);
> +		if (ret < 0)
> +			break;
> +
> +		if (!ret)
> +			break;
> +
> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdi->tdev->spdm_mutex);
> +
> +	if (ret) {

I'd have separate err_unlock label and error handling path.
This pattern is somewhat harder to read.

> +		tsm_tdi_unbind(tdi);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(tsm_tdi_bind);
> +
> +void tsm_tdi_unbind(struct tsm_tdi *tdi)
> +{
> +	tsm_tdi_reclaim(tdi, tsm.private_data);
> +	tdi->vmid = 0;
> +	tdi->asid = 0;
> +	tdi->guest_rid = 0;
> +}
> +EXPORT_SYMBOL_GPL(tsm_tdi_unbind);
> +
> +int tsm_guest_request(struct tsm_tdi *tdi, enum tsm_tdisp_state *state, void *req_data)
> +{
> +	int ret;
> +
> +	if (!tsm.ops->guest_request)
> +		return -EPERM;
> +
> +	mutex_lock(&tdi->tdev->spdm_mutex);
guard(mutex)(&tdi->tdev->spmd_mutex);

Then you can do returns on error or finish instead of breaking out
just to unlock.



> +	while (1) {
> +		ret = tsm.ops->guest_request(tdi, tdi->guest_rid, tdi->vmid, req_data,
> +					     state, tsm.private_data);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdi->tdev->spdm_mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tsm_guest_request);

> +
> +module_init(tsm_init);
Put these next to the relevant functions - or put the functions next to these.
> +module_exit(tsm_cleanup);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/Documentation/virt/coco/tsm.rst b/Documentation/virt/coco/tsm.rst
> new file mode 100644
> index 000000000000..3be6e8491e42
> --- /dev/null
> +++ b/Documentation/virt/coco/tsm.rst

I'd break this out as a separate patch - mostly to shout "DOCS here - read them"
as otherwise they end up at the end of a long email no one scrolls through.

> @@ -0,0 +1,62 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +What it is
> +==========
> +
> +This is for PCI passthrough in confidential computing (CoCo: SEV-SNP, TDX, CoVE).
> +Currently passing through PCI devices to a CoCo VM uses SWIOTLB to pre-shared
> +memory buffers.
> +
> +PCIe IDE (Integrity and Data Encryption) and TDISP (TEE Device Interface Security
> +Protocol) are protocols to enable encryption over PCIe link and DMA to encrypted
> +memory. This doc is focused to DMAing to encrypted VM, the encrypted host memory is
> +out of scope.
> +
> +
> +Protocols
> +=========
> +
> +PCIe r6 DOE is a mailbox protocol to read/write object from/to device.
> +Objects are of plain SPDM or secure SPDM type. SPDM is responsible for authenticating
> +devices, creating a secure link between a device and TSM.
> +IDE_KM manages PCIe link encryption keys, it works on top of secure SPDM.
> +TDISP manages a passed through PCI function state, also works on top on secure SPDM.
> +Additionally, PCIe defines IDE capability which provides the host OS a way
> +to enable streams on the PCIe link.
> +
> +
> +TSM module
> +==========
> +
> +This is common place to trigger device authentication and keys management.
> +It exposes certificates/measurenets/reports/status via sysfs and provides control

measurements 

> +over the link (limited though by the TSM capabilities).
> +A platform is expected to register a specific set of hooks. The same module works
> +in host and guest OS, the set of requires platform hooks is quite different.
> +
> +
> +Flow
> +====
> +
> +At the boot time the tsm.ko scans the PCI bus to find and setup TDISP-cabable
> +devices; it also listens to hotplug events. If setup was successful, tsm-prefixed
> +nodes will appear in sysfs.
> +
> +Then, the user enables IDE by writing to /sys/bus/pci/devices/0000:e1:00.0/tsm_dev_connect
> +and this is how PCIe encryption is enabled.
> +
> +To pass the device through, a modifined VMM is required.
> +
> +In the VM, the same tsm.ko loads. In addition to the host's setup, the VM wants
> +to receive the report and enable secure DMA or/and secure MMIO, via some VM<->HV
> +protocol (such as AMD GHCB). Once this is done, a VM can access validated MMIO
> +with the Cbit set and the device can DMA to encrypted memory.
> +
> +
> +References
> +==========
> +
> +[1] TEE Device Interface Security Protocol - TDISP - v2022-07-27
> +https://members.pcisig.com/wg/PCI-SIG/document/18268?downloadRevision=21500
> +[2] Security Protocol and Data Model (SPDM)
> +https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.2.1.pdf
> diff --git a/drivers/virt/coco/Kconfig b/drivers/virt/coco/Kconfig
> index 87d142c1f932..67a9c9daf96d 100644
> --- a/drivers/virt/coco/Kconfig
> +++ b/drivers/virt/coco/Kconfig
> @@ -7,6 +7,17 @@ config TSM_REPORTS
>  	select CONFIGFS_FS
>  	tristate
>  
> +config TSM
> +	tristate "Platform support for TEE Device Interface Security Protocol (TDISP)"
> +	default m

No defaulting to m.  People get grumpy when this stuff turns up on their embedded
distros with no hardware support.

> +	depends on AMD_MEM_ENCRYPT
> +	select PCI_DOE
> +	select PCI_IDE
> +	help
> +	  Add a common place for user visible platform support for PCIe TDISP.
> +	  TEE Device Interface Security Protocol (TDISP) from PCI-SIG,
> +	  https://pcisig.com/tee-device-interface-security-protocol-tdisp
> +
>  source "drivers/virt/coco/efi_secret/Kconfig"
>  
>  source "drivers/virt/coco/sev-guest/Kconfig"
Jason Gunthorpe Aug. 28, 2024, 11:42 p.m. UTC | #4
On Wed, Aug 28, 2024 at 01:00:46PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 27/8/24 22:32, Jason Gunthorpe wrote:
> > On Fri, Aug 23, 2024 at 11:21:21PM +1000, Alexey Kardashevskiy wrote:
> > > The module responsibilities are:
> > > 1. detect TEE support in a device and create nodes in the device's sysfs
> > > entry;
> > > 2. allow binding a PCI device to a VM for passing it through in a trusted
> > > manner;
> > 
> > Binding devices to VMs and managing their lifecycle is the purvue of
> > VFIO and iommufd, it should not be exposed via weird sysfs calls like
> > this. You can't build the right security model without being inside
> > the VFIO context.
> 
> Is "extend the MAP_DMA uAPI to accept {gmemfd, offset}" enough for the VFIO
> context, or there is more and I am missing it?

No, you need to have all the virtual PCI device creation stuff linked
to a VFIO cdev to prove you have rights to do things to the physical
device.
 
> > I'm not convinced this should be in some side module - it seems like
> > this is possibly more logically integrated as part of the iommu..
> 
> There are two things which the module's sysfs interface tries dealing with:
> 
> 1) device authentication (by the PSP, contrary to Lukas'es host-based CMA)
> and PCIe link encryption (PCIe IDE keys only programmable via the PSP);

So when I look at the spec I think that probably TIO_DEV_* should be
connected to VFIO, somewhere as vfio/kvm/iommufd ioctls. This needs to
be coordinated with everyone else because everyone has *some kind* of
"trusted world create for me a vPCI device in the secure VM" set of
verbs.

TIO_TDI is presumably the device authentication stuff?

This is why I picked on tsm_dev_connect_store()..

> Besides sysfs, the module provides common "verbs" to be defined by the
> platform (which is right now a reduced set of the AMD PSP operations but the
> hope is it can be generalized); and the module also does PCIe DOE bouncing
> (which is also not uncommon). Part of this exercise is trying to find some
> common ground (if it is possible), hence routing everything via this module.

I think there is a seperation between how the internal stuff in the
kernel works and how/what the uAPIs are.

General stuff like authenticate/accept/authorize a PCI device needs
to be pretty cross platform.

Stuff like creating vPCIs needs to be ioctls linked to KVM/VFIO
somehow and can have more platform specific components.

I would try to split your topics up more along those lines..

Jason
Dan Williams Aug. 29, 2024, midnight UTC | #5
Jason Gunthorpe wrote:
[..]
> So when I look at the spec I think that probably TIO_DEV_* should be
> connected to VFIO, somewhere as vfio/kvm/iommufd ioctls. This needs to
> be coordinated with everyone else because everyone has *some kind* of
> "trusted world create for me a vPCI device in the secure VM" set of
> verbs.
> 
> TIO_TDI is presumably the device authentication stuff?

I would expect no, because device authentication is purely a
physical-device concept, and a TDI is some subset of that device (up to
and including full physical-function passthrough) that becomes VM
private-world assignable.

> This is why I picked on tsm_dev_connect_store()..
> 
> > Besides sysfs, the module provides common "verbs" to be defined by the
> > platform (which is right now a reduced set of the AMD PSP operations but the
> > hope is it can be generalized); and the module also does PCIe DOE bouncing
> > (which is also not uncommon). Part of this exercise is trying to find some
> > common ground (if it is possible), hence routing everything via this module.
> 
> I think there is a seperation between how the internal stuff in the
> kernel works and how/what the uAPIs are.
> 
> General stuff like authenticate/accept/authorize a PCI device needs
> to be pretty cross platform.
> 
> Stuff like creating vPCIs needs to be ioctls linked to KVM/VFIO
> somehow and can have more platform specific components.
> 
> I would try to split your topics up more along those lines..

I agree with this. There is a definite PCI only / VFIO-independent
portion of this that is before any consideration of TDISP LOCKED and RUN
states. It only deals with PCI device-authentication, link encryption
management, and is independent of any confidential VM. Then there is the
whole "assignable device" piece that is squarely KVM/VFIO territory.

Theoretically one could stop at link encryption setup and never proceed
with the rest. That is, assuming the platform allows for IDE protected
traffic to flow in the "T=0" (shared world device) case.
Jason Gunthorpe Aug. 29, 2024, 12:09 a.m. UTC | #6
On Wed, Aug 28, 2024 at 05:00:57PM -0700, Dan Williams wrote:
> Jason Gunthorpe wrote:
> [..]
> > So when I look at the spec I think that probably TIO_DEV_* should be
> > connected to VFIO, somewhere as vfio/kvm/iommufd ioctls. This needs to
> > be coordinated with everyone else because everyone has *some kind* of
> > "trusted world create for me a vPCI device in the secure VM" set of
> > verbs.
> > 
> > TIO_TDI is presumably the device authentication stuff?
> 
> I would expect no, because device authentication is purely a
> physical-device concept, and a TDI is some subset of that device (up to
> and including full physical-function passthrough) that becomes VM
> private-world assignable.

So I got it backwards then? The TDI is the vPCI and DEV is the way to
operate TDISP/IDE/SPDM/etc? Spec says:

 To use a TDISP capable device with SEV-TIO, host software must first
 arrange for the SEV firmware to establish a connection with the device
 by invoking the TIO_DEV_CONNECT
 command. The TIO_DEV_CONNECT command performs the following:

 * Establishes a secure SPDM session using Secured Messages for SPDM.
 * Constructs IDE selective streams between the root complex and the device.
 * Checks the TDISP capabilities of the device.

Too many TLAs :O

> I agree with this. There is a definite PCI only / VFIO-independent
> portion of this that is before any consideration of TDISP LOCKED and RUN
> states. It only deals with PCI device-authentication, link encryption
> management, and is independent of any confidential VM. Then there is the
> whole "assignable device" piece that is squarely KVM/VFIO territory.

Yes
 
> Theoretically one could stop at link encryption setup and never proceed
> with the rest. That is, assuming the platform allows for IDE protected
> traffic to flow in the "T=0" (shared world device) case.

Yes. I keep hearing PCI people talking about interesting use cases for
IDE streams independent of any of the confidential compute stuff. I
think they should not be tied together.

Jason
Dan Williams Aug. 29, 2024, 12:20 a.m. UTC | #7
Jason Gunthorpe wrote:
> On Wed, Aug 28, 2024 at 05:00:57PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > [..]
> > > So when I look at the spec I think that probably TIO_DEV_* should be
> > > connected to VFIO, somewhere as vfio/kvm/iommufd ioctls. This needs to
> > > be coordinated with everyone else because everyone has *some kind* of
> > > "trusted world create for me a vPCI device in the secure VM" set of
> > > verbs.
> > > 
> > > TIO_TDI is presumably the device authentication stuff?
> > 
> > I would expect no, because device authentication is purely a
> > physical-device concept, and a TDI is some subset of that device (up to
> > and including full physical-function passthrough) that becomes VM
> > private-world assignable.
> 
> So I got it backwards then? The TDI is the vPCI and DEV is the way to
> operate TDISP/IDE/SPDM/etc? Spec says:

Right.

> 
>  To use a TDISP capable device with SEV-TIO, host software must first
>  arrange for the SEV firmware to establish a connection with the device
>  by invoking the TIO_DEV_CONNECT
>  command. The TIO_DEV_CONNECT command performs the following:
> 
>  * Establishes a secure SPDM session using Secured Messages for SPDM.
>  * Constructs IDE selective streams between the root complex and the device.
>  * Checks the TDISP capabilities of the device.
> 
> Too many TLAs :O

My favorite is that the spec calls the device capability "TEE I/O" but
when you are speaking it no one can tell if you are talking about "TEE
I/O" the generic PCI thing, or "TIO", the AMD-specific seasoning on top
of TDISP.

> > I agree with this. There is a definite PCI only / VFIO-independent
> > portion of this that is before any consideration of TDISP LOCKED and RUN
> > states. It only deals with PCI device-authentication, link encryption
> > management, and is independent of any confidential VM. Then there is the
> > whole "assignable device" piece that is squarely KVM/VFIO territory.
> 
> Yes
>  
> > Theoretically one could stop at link encryption setup and never proceed
> > with the rest. That is, assuming the platform allows for IDE protected
> > traffic to flow in the "T=0" (shared world device) case.
> 
> Yes. I keep hearing PCI people talking about interesting use cases for
> IDE streams independent of any of the confidential compute stuff. I
> think they should not be tied together.

I encourage those folks need to read the actual hardware specs, not just
the PCI spec. As far as I know there is only one host platform
implementation that allows IDE establishment and traffic flow for T=0
cases. So it is not yet trending to be a common thing that the PCI core
can rely upon.
Alexey Kardashevskiy Aug. 29, 2024, 4:57 a.m. UTC | #8
On 29/8/24 09:42, Jason Gunthorpe wrote:
> On Wed, Aug 28, 2024 at 01:00:46PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 27/8/24 22:32, Jason Gunthorpe wrote:
>>> On Fri, Aug 23, 2024 at 11:21:21PM +1000, Alexey Kardashevskiy wrote:
>>>> The module responsibilities are:
>>>> 1. detect TEE support in a device and create nodes in the device's sysfs
>>>> entry;
>>>> 2. allow binding a PCI device to a VM for passing it through in a trusted
>>>> manner;
>>>
>>> Binding devices to VMs and managing their lifecycle is the purvue of
>>> VFIO and iommufd, it should not be exposed via weird sysfs calls like
>>> this. You can't build the right security model without being inside
>>> the VFIO context.
>>
>> Is "extend the MAP_DMA uAPI to accept {gmemfd, offset}" enough for the VFIO
>> context, or there is more and I am missing it?
> 
> No, you need to have all the virtual PCI device creation stuff linked
> to a VFIO cdev to prove you have rights to do things to the physical
> device.

The VM-to-VFIOdevice binding is already in the KVM VFIO device, the rest 
is the same old VFIO.

>>> I'm not convinced this should be in some side module - it seems like
>>> this is possibly more logically integrated as part of the iommu..
>>
>> There are two things which the module's sysfs interface tries dealing with:
>>
>> 1) device authentication (by the PSP, contrary to Lukas'es host-based CMA)
>> and PCIe link encryption (PCIe IDE keys only programmable via the PSP);
> 
> So when I look at the spec I think that probably TIO_DEV_* should be
> connected to VFIO, somewhere as vfio/kvm/iommufd ioctls. This needs to
> be coordinated with everyone else because everyone has *some kind* of
> "trusted world create for me a vPCI device in the secure VM" set of
> verbs.
> 
> TIO_TDI is presumably the device authentication stuff?

Not really. In practice:
- TDI is usually a SRIOV VF (and intended for passing through to a VM);
- DEV is always PF#0 of a (possibly multifunction) physical device which 
controls physical link encryption and authentication.

> This is why I picked on tsm_dev_connect_store()..
> >> Besides sysfs, the module provides common "verbs" to be defined by the
>> platform (which is right now a reduced set of the AMD PSP operations but the
>> hope is it can be generalized); and the module also does PCIe DOE bouncing
>> (which is also not uncommon). Part of this exercise is trying to find some
>> common ground (if it is possible), hence routing everything via this module.
> 
> I think there is a seperation between how the internal stuff in the
> kernel works and how/what the uAPIs are.
> 
> General stuff like authenticate/accept/authorize a PCI device needs
> to be pretty cross platform.

So those TIO_DEV_*** are for the PCI layer and nothing there is for 
virtualization and this is what is exposed via sysfs.

TIO_TDI_*** commands are for virtualization and, say, the user cannot 
attach a TDI to a VM by using the sysfs interface (new ioctls are for 
this), the user can only read some shared info about a TDI (interface 
report, status).

> Stuff like creating vPCIs needs to be ioctls linked to KVM/VFIO
> somehow and can have more platform specific components.

Right, I am adding ioctls to the KVM VFIO device.

> I would try to split your topics up more along those lines..

Fair point, my bad. I'll start with a PF-only module and then add TDI 
stuff to it separately.

I wonder if there is enough value to try keeping the TIO_DEV_* and 
TIO_TDI_* API together or having TIO_DEV_* in some PCI module and 
TIO_TDI_* in KVM is a non-confusing way to proceed with this. Adding 
things to the PCI's sysfs from more places bothers me more than this 
frankenmodule. Thanks,
Jason Gunthorpe Aug. 29, 2024, 12:03 p.m. UTC | #9
On Wed, Aug 28, 2024 at 05:20:16PM -0700, Dan Williams wrote:

> I encourage those folks need to read the actual hardware specs, not just
> the PCI spec. As far as I know there is only one host platform
> implementation that allows IDE establishment and traffic flow for T=0
> cases. So it is not yet trending to be a common thing that the PCI core
> can rely upon.

I think their perspective is this is the only path to do certain
things in PCI land (from a packet on the wire perspective) so they
would expect the platforms to align eventually if the standards go
that way..

Jason
Jason Gunthorpe Aug. 29, 2024, 12:07 p.m. UTC | #10
On Thu, Aug 29, 2024 at 02:57:34PM +1000, Alexey Kardashevskiy wrote:

> > > Is "extend the MAP_DMA uAPI to accept {gmemfd, offset}" enough for the VFIO
> > > context, or there is more and I am missing it?
> > 
> > No, you need to have all the virtual PCI device creation stuff linked
> > to a VFIO cdev to prove you have rights to do things to the physical
> > device.
> 
> The VM-to-VFIOdevice binding is already in the KVM VFIO device, the rest is
> the same old VFIO.

Frankly, I'd rather not add any more VFIO stuff to KVM. Today KVM has
no idea of a VFIO on most platforms.

Given you already have an issue with iommu driver synchronization this
looks like it might be a poor choice anyhow..

> I wonder if there is enough value to try keeping the TIO_DEV_* and TIO_TDI_*
> API together or having TIO_DEV_* in some PCI module and TIO_TDI_* in KVM is
> a non-confusing way to proceed with this. Adding things to the PCI's sysfs
> from more places bothers me more than this frankenmodule. Thanks,

I wouldn't mix them up, they are very different. Just because they are
RPCs to the same bit of FW doesn't really mean they should be together
in the same interfaces or ops structures.

Jason
Alexey Kardashevskiy Sept. 2, 2024, 12:52 a.m. UTC | #11
On 29/8/24 22:07, Jason Gunthorpe wrote:
> On Thu, Aug 29, 2024 at 02:57:34PM +1000, Alexey Kardashevskiy wrote:
> 
>>>> Is "extend the MAP_DMA uAPI to accept {gmemfd, offset}" enough for the VFIO
>>>> context, or there is more and I am missing it?
>>>
>>> No, you need to have all the virtual PCI device creation stuff linked
>>> to a VFIO cdev to prove you have rights to do things to the physical
>>> device.
>>
>> The VM-to-VFIOdevice binding is already in the KVM VFIO device, the rest is
>> the same old VFIO.
> 
> Frankly, I'd rather not add any more VFIO stuff to KVM. Today KVM has
> no idea of a VFIO on most platforms.
> > Given you already have an issue with iommu driver synchronization this
> looks like it might be a poor choice anyhow..
> >> I wonder if there is enough value to try keeping the TIO_DEV_* and 
TIO_TDI_*
>> API together or having TIO_DEV_* in some PCI module and TIO_TDI_* in KVM is
>> a non-confusing way to proceed with this. Adding things to the PCI's sysfs
>> from more places bothers me more than this frankenmodule. Thanks,
> 
> I wouldn't mix them up, they are very different. Just because they are
> RPCs to the same bit of FW doesn't really mean they should be together
> in the same interfaces or ops structures.

Both DEV_* and TDI_* use the same SecureSPDM channel (on top of the 
PF#0's PCIe DOE cap) for IDE_KM (for DEV_*) and TDISP (for TDI_*) so 
there is some common ground. Thanks,
Aneesh Kumar K.V Sept. 2, 2024, 6:50 a.m. UTC | #12
...

> +static int tsm_dev_connect(struct tsm_dev *tdev, void *private_data, unsigned int val)
> +{
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->dev_connect))
> +		return -EPERM;
> +
> +	tdev->ide_pre = val == 2;
> +	if (tdev->ide_pre)
> +		tsm_set_sel_ide(tdev);
> +
> +	mutex_lock(&tdev->spdm_mutex);
> +	while (1) {
> +		ret = tsm.ops->dev_connect(tdev, tsm.private_data);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdev->spdm_mutex);
> +
> +	if (!tdev->ide_pre)
> +		ret = tsm_set_sel_ide(tdev);
> +
> +	tdev->connected = (ret == 0);
> +
> +	return ret;
> +}
> +

I was expecting the DEV_CONNECT to happen in tsm_dev_init in
tsm_alloc_device(). Can you describe how the sysfs file is going to be
used? I didn't find details regarding that in the cover letter 
workflow section. 

-aneesh
Alexey Kardashevskiy Sept. 2, 2024, 7:26 a.m. UTC | #13
On 2/9/24 16:50, Aneesh Kumar K.V wrote:
> 
> ...
> 
>> +static int tsm_dev_connect(struct tsm_dev *tdev, void *private_data, unsigned int val)
>> +{
>> +	int ret;
>> +
>> +	if (WARN_ON(!tsm.ops->dev_connect))
>> +		return -EPERM;
>> +
>> +	tdev->ide_pre = val == 2;
>> +	if (tdev->ide_pre)
>> +		tsm_set_sel_ide(tdev);
>> +
>> +	mutex_lock(&tdev->spdm_mutex);
>> +	while (1) {
>> +		ret = tsm.ops->dev_connect(tdev, tsm.private_data);
>> +		if (ret <= 0)
>> +			break;
>> +
>> +		ret = spdm_forward(&tdev->spdm, ret);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +	mutex_unlock(&tdev->spdm_mutex);
>> +
>> +	if (!tdev->ide_pre)
>> +		ret = tsm_set_sel_ide(tdev);
>> +
>> +	tdev->connected = (ret == 0);
>> +
>> +	return ret;
>> +}
>> +
> 
> I was expecting the DEV_CONNECT to happen in tsm_dev_init in
> tsm_alloc_device(). Can you describe how the sysfs file is going to be
> used? I didn't find details regarding that in the cover letter
> workflow section.

Until I figure out the cooperation with the host-based CMA from Lukas, I 
do not automatically enable IDE. Instead, the operator needs to enable 
IDE manually:

sudo bash -c 'echo 2 > /sys/bus/pci/devices/0000:e1:00.0/tsm_dev_connect'

where e1:00.0 is physical function 0 of the device; or "echo 0" to 
disable the IDE encryption. Why "2" is different from "1" - this is a 
leftover from debugging. Thanks,


> 
> -aneesh
Dan Williams Sept. 3, 2024, 11:51 p.m. UTC | #14
Alexey Kardashevskiy wrote:
> The module responsibilities are:
> 1. detect TEE support in a device and create nodes in the device's sysfs
> entry;
> 2. allow binding a PCI device to a VM for passing it through in a trusted
> manner;
> 3. store measurements/certificates/reports and provide access to those for
> the userspace via sysfs.
> 
> This relies on the platform to register a set of callbacks,
> for both host and guest.
> 
> And tdi_enabled in the device struct.

I had been holding out hope that when I got this patch the changelog
would give some justification for what folks had been whispering to me
in recent days: "hey Dan, looks like Alexey is completely ignoring the
PCI/TSM approach?".

Bjorn acked that approach here:

http://lore.kernel.org/20240419220729.GA307280@bhelgaas

It is in need of a refresh, preview here:

https://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux.git/commit/?id=5807465b92ac

At best, I am disappointed that this RFC ignored it. More comments
below, but please do clarify if we are working together on a Bjorn-acked
direction, or not.

> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> ---
>  drivers/virt/coco/Makefile      |    1 +
>  include/linux/device.h          |    5 +
>  include/linux/tsm.h             |  263 ++++
>  drivers/virt/coco/tsm.c         | 1336 ++++++++++++++++++++
>  Documentation/virt/coco/tsm.rst |   62 +
>  drivers/virt/coco/Kconfig       |   11 +
>  6 files changed, 1678 insertions(+)
> 
> diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
> index 75defec514f8..5d1aefb62714 100644
> --- a/drivers/virt/coco/Makefile
> +++ b/drivers/virt/coco/Makefile
> @@ -3,6 +3,7 @@
>  # Confidential computing related collateral
>  #
>  obj-$(CONFIG_TSM_REPORTS)	+= tsm-report.o
> +obj-$(CONFIG_TSM)		+= tsm.o

The expectation is that something like drivers/virt/coco/tsm.c would be
the class driver for cross-vendor generic TSM uAPI. The PCI specific
bits go in drivers/pci/tsm.c.

>  obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
>  obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
>  obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 34eb20f5966f..bb58ed1fb8da 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -45,6 +45,7 @@ struct fwnode_handle;
>  struct iommu_group;
>  struct dev_pin_info;
>  struct dev_iommu;
> +struct tsm_tdi;
>  struct msi_device_data;
>  
>  /**
> @@ -801,6 +802,7 @@ struct device {
>  	void	(*release)(struct device *dev);
>  	struct iommu_group	*iommu_group;
>  	struct dev_iommu	*iommu;
> +	struct tsm_tdi		*tdi;

No. The only known device model for TDIs is PCI devices, i.e. TDISP is a
PCI protocol. Even SPDM which is cross device-type generic did not touch
'struct device'.

>  	struct device_physical_location *physical_location;
>  
> @@ -822,6 +824,9 @@ struct device {
>  #ifdef CONFIG_DMA_NEED_SYNC
>  	bool			dma_skip_sync:1;
>  #endif
> +#if defined(CONFIG_TSM) || defined(CONFIG_TSM_MODULE)
> +	bool			tdi_enabled:1;
> +#endif
>  };
>  
>  /**
> diff --git a/include/linux/tsm.h b/include/linux/tsm.h
> new file mode 100644
> index 000000000000..d48eceaf5bc0
> --- /dev/null
> +++ b/include/linux/tsm.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef LINUX_TSM_H
> +#define LINUX_TSM_H
> +
> +#include <linux/cdev.h>
> +
> +/* SPDM control structure for DOE */
> +struct tsm_spdm {
> +	unsigned long req_len;
> +	void *req;
> +	unsigned long rsp_len;
> +	void *rsp;
> +
> +	struct pci_doe_mb *doe_mb;
> +	struct pci_doe_mb *doe_mb_secured;
> +};
> +
> +/* Data object for measurements/certificates/attestationreport */
> +struct tsm_blob {
> +	void *data;
> +	size_t len;
> +	struct kref kref;
> +	void (*release)(struct tsm_blob *b);

Hard to judge the suitability of these without documentation. Skeptical
that these TDISP evidence blobs need to have a lifetime distinct from
the device's TSM context.

> +};
> +
> +struct tsm_blob *tsm_blob_new(void *data, size_t len, void (*release)(struct tsm_blob *b));
> +struct tsm_blob *tsm_blob_get(struct tsm_blob *b);
> +void tsm_blob_put(struct tsm_blob *b);
> +
> +/**
> + * struct tdisp_interface_id - TDISP INTERFACE_ID Definition
> + *
> + * @function_id: Identifies the function of the device hosting the TDI
> + * 15:0: @rid: Requester ID
> + * 23:16: @rseg: Requester Segment (Reserved if Requester Segment Valid is Clear)
> + * 24: @rseg_valid: Requester Segment Valid
> + * 31:25 – Reserved
> + * 8B - Reserved
> + */
> +struct tdisp_interface_id {
> +	union {
> +		struct {
> +			u32 function_id;
> +			u8 reserved[8];
> +		};
> +		struct {
> +			u16 rid;
> +			u8 rseg;
> +			u8 rseg_valid:1;

Linux typically avoids C-bitfields in hardware interfaces in favor of
bitfield.h macros.

> +		};
> +	};
> +} __packed;

Does this need to be "packed"? Looks naturally aligned to pahole.

> +
> +/*
> + * Measurement block as defined in SPDM DSP0274.
> + */
> +struct spdm_measurement_block_header {
> +	u8 index;
> +	u8 spec; /* MeasurementSpecification */
> +	u16 size;
> +} __packed;
> +
> +struct dmtf_measurement_block_header {
> +	u8 type;  /* DMTFSpecMeasurementValueType */
> +	u16 size; /* DMTFSpecMeasurementValueSize */
> +} __packed;

This one might need to be packed.

> +
> +struct dmtf_measurement_block_device_mode {
> +	u32 opmode_cap;	 /* OperationalModeCapabilties */
> +	u32 opmode_sta;  /* OperationalModeState */
> +	u32 devmode_cap; /* DeviceModeCapabilties */
> +	u32 devmode_sta; /* DeviceModeState */
> +} __packed;
> +
> +struct spdm_certchain_block_header {
> +	u16 length;
> +	u16 reserved;
> +} __packed;

These last 2 struct do not seem to need it.

> +
> +/*
> + * TDI Report Structure as defined in TDISP.
> + */
> +struct tdi_report_header {
> +	union {
> +		u16 interface_info;
> +		struct {
> +			u16 no_fw_update:1; /* fw updates not permitted in CONFIG_LOCKED or RUN */
> +			u16 dma_no_pasid:1; /* TDI generates DMA requests without PASID */
> +			u16 dma_pasid:1; /* TDI generates DMA requests with PASID */
> +			u16 ats:1; /*  ATS supported and enabled for the TDI */
> +			u16 prs:1; /*  PRS supported and enabled for the TDI */
> +			u16 reserved1:11;
> +		};

Same C-bitfield comment, as before, and what about big endian hosts?

> +	};
> +	u16 reserved2;
> +	u16 msi_x_message_control;
> +	u16 lnr_control;
> +	u32 tph_control;
> +	u32 mmio_range_count;
> +} __packed;
> +
> +/*
> + * Each MMIO Range of the TDI is reported with the MMIO reporting offset added.
> + * Base and size in units of 4K pages
> + */
> +struct tdi_report_mmio_range {
> +	u64 first_page; /* First 4K page with offset added */
> +	u32 num; 	/* Number of 4K pages in this range */
> +	union {
> +		u32 range_attributes;
> +		struct {
> +			u32 msix_table:1;
> +			u32 msix_pba:1;
> +			u32 is_non_tee_mem:1;
> +			u32 is_mem_attr_updatable:1;
> +			u32 reserved:12;
> +			u32 range_id:16;
> +		};
> +	};
> +} __packed;
> +
> +struct tdi_report_footer {
> +	u32 device_specific_info_len;
> +	u8 device_specific_info[];
> +} __packed;
> +
> +#define TDI_REPORT_HDR(rep)		((struct tdi_report_header *) ((rep)->data))
> +#define TDI_REPORT_MR_NUM(rep)		(TDI_REPORT_HDR(rep)->mmio_range_count)
> +#define TDI_REPORT_MR_OFF(rep)		((struct tdi_report_mmio_range *) (TDI_REPORT_HDR(rep) + 1))
> +#define TDI_REPORT_MR(rep, rangeid)	TDI_REPORT_MR_OFF(rep)[rangeid]
> +#define TDI_REPORT_FTR(rep)		((struct tdi_report_footer *) &TDI_REPORT_MR((rep), \
> +					TDI_REPORT_MR_NUM(rep)))
> +
> +/* Physical device descriptor responsible for IDE/TDISP setup */
> +struct tsm_dev {
> +	struct kref kref;

Another kref that begs the question why would a tsm_dev need its own
lifetime? This also goes back to the organization in the PCI/TSM
proposal that all TSM objects are at max bound to the lifetime of
whatever is shorter, the registration of the low-level TSM driver or the
PCI device itself.

> +	const struct attribute_group *ag;

PCI device attribute groups are already conveyed in a well known
(lifetime and user visibility) manner. What is motivating this
"re-imagining"?

> +	struct pci_dev *pdev; /* Physical PCI function #0 */
> +	struct tsm_spdm spdm;
> +	struct mutex spdm_mutex;

Is an spdm lock sufficient? I expect the device needs to serialize all
TSM communications, not just spdm? Documentation of the locking would
help.

> +
> +	u8 tc_mask;
> +	u8 cert_slot;
> +	u8 connected;
> +	struct {
> +		u8 enabled:1;
> +		u8 enable:1;
> +		u8 def:1;
> +		u8 dev_ide_cfg:1;
> +		u8 dev_tee_limited:1;
> +		u8 rootport_ide_cfg:1;
> +		u8 rootport_tee_limited:1;
> +		u8 id;
> +	} selective_ide[256];
> +	bool ide_pre;
> +
> +	struct tsm_blob *meas;
> +	struct tsm_blob *certs;

Compare these to the blobs that Lukas that maintains for CMA. To my
knoweldge no new kref lifetime rules independent of the authenticated
lifetime.

> +
> +	void *data; /* Platform specific data */
> +};
> +
> +/* PCI function for passing through, can be the same as tsm_dev::pdev */
> +struct tsm_tdi {
> +	const struct attribute_group *ag;
> +	struct pci_dev *pdev;
> +	struct tsm_dev *tdev;
> +
> +	u8 rseg;
> +	u8 rseg_valid;
> +	bool validated;
> +
> +	struct tsm_blob *report;
> +
> +	void *data; /* Platform specific data */
> +
> +	u64 vmid;
> +	u32 asid;
> +	u16 guest_rid; /* BDFn of PCI Fn in the VM */
> +};
> +
> +struct tsm_dev_status {
> +	u8 ctx_state;
> +	u8 tc_mask;
> +	u8 certs_slot;
> +	u16 device_id;
> +	u16 segment_id;
> +	u8 no_fw_update;
> +	u16 ide_stream_id[8];
> +};
> +
> +enum tsm_spdm_algos {
> +	TSM_TDI_SPDM_ALGOS_DHE_SECP256R1,
> +	TSM_TDI_SPDM_ALGOS_DHE_SECP384R1,
> +	TSM_TDI_SPDM_ALGOS_AEAD_AES_128_GCM,
> +	TSM_TDI_SPDM_ALGOS_AEAD_AES_256_GCM,
> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_RSASSA_3072,
> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P256,
> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P384,
> +	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_256,
> +	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_384,
> +	TSM_TDI_SPDM_ALGOS_KEY_SCHED_SPDM_KEY_SCHEDULE,
> +};
> +
> +enum tsm_tdisp_state {
> +	TDISP_STATE_UNAVAIL,
> +	TDISP_STATE_CONFIG_UNLOCKED,
> +	TDISP_STATE_CONFIG_LOCKED,
> +	TDISP_STATE_RUN,
> +	TDISP_STATE_ERROR,
> +};
> +
> +struct tsm_tdi_status {
> +	bool valid;
> +	u8 meas_digest_fresh:1;
> +	u8 meas_digest_valid:1;
> +	u8 all_request_redirect:1;
> +	u8 bind_p2p:1;
> +	u8 lock_msix:1;
> +	u8 no_fw_update:1;
> +	u16 cache_line_size;
> +	u64 spdm_algos; /* Bitmask of tsm_spdm_algos */
> +	u8 certs_digest[48];
> +	u8 meas_digest[48];
> +	u8 interface_report_digest[48];
> +
> +	/* HV only */
> +	struct tdisp_interface_id id;
> +	u8 guest_report_id[16];
> +	enum tsm_tdisp_state state;
> +};
> +
> +struct tsm_ops {

The lack of documentation for these ops makes review difficult.

> +	/* HV hooks */
> +	int (*dev_connect)(struct tsm_dev *tdev, void *private_data);

Lets not abandon type-safety this early. Is it really the case that all
of these helpers need anonymous globs passed to them?

> +	int (*dev_reclaim)(struct tsm_dev *tdev, void *private_data);
> +	int (*dev_status)(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s);
> +	int (*ide_refresh)(struct tsm_dev *tdev, void *private_data);

IDE Key Refresh seems an enhancement worth breaking out of the base
enabling.

> +	int (*tdi_bind)(struct tsm_tdi *tdi, u32 bdfn, u64 vmid, u32 asid, void *private_data);
> +	int (*tdi_reclaim)(struct tsm_tdi *tdi, void *private_data);
> +
> +	int (*guest_request)(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, void *req_data,
> +			     enum tsm_tdisp_state *state, void *private_data);
> +
> +	/* VM hooks */
> +	int (*tdi_validate)(struct tsm_tdi *tdi, bool invalidate, void *private_data);
> +
> +	/* HV and VM hooks */
> +	int (*tdi_status)(struct tsm_tdi *tdi, void *private_data, struct tsm_tdi_status *ts);

Lets not mix HV and VM hooks in the same ops without good reason.

> +};
> +
> +void tsm_set_ops(struct tsm_ops *ops, void *private_data);
> +struct tsm_tdi *tsm_tdi_get(struct device *dev);
> +int tsm_tdi_bind(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, u32 asid);
> +void tsm_tdi_unbind(struct tsm_tdi *tdi);
> +int tsm_guest_request(struct tsm_tdi *tdi, enum tsm_tdisp_state *state, void *req_data);
> +struct tsm_tdi *tsm_tdi_find(u32 guest_rid, u64 vmid);
> +
> +int pci_dev_tdi_validate(struct pci_dev *pdev);
> +ssize_t tsm_report_gen(struct tsm_blob *report, char *b, size_t len);
> +
> +#endif /* LINUX_TSM_H */
> diff --git a/drivers/virt/coco/tsm.c b/drivers/virt/coco/tsm.c
> new file mode 100644
> index 000000000000..e90455a0267f
> --- /dev/null
> +++ b/drivers/virt/coco/tsm.c
> @@ -0,0 +1,1336 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/pci-doe.h>
> +#include <linux/pci-ide.h>
> +#include <linux/file.h>
> +#include <linux/fdtable.h>
> +#include <linux/tsm.h>
> +#include <linux/kvm_host.h>
> +
> +#define DRIVER_VERSION	"0.1"
> +#define DRIVER_AUTHOR	"aik@amd.com"
> +#define DRIVER_DESC	"TSM TDISP driver"
> +
> +static struct {
> +	struct tsm_ops *ops;
> +	void *private_data;
> +
> +	uint tc_mask;
> +	uint cert_slot;
> +	bool physfn;
> +} tsm;
> +
> +module_param_named(tc_mask, tsm.tc_mask, uint, 0644);
> +MODULE_PARM_DESC(tc_mask, "Mask of traffic classes enabled in the device");
> +
> +module_param_named(cert_slot, tsm.cert_slot, uint, 0644);
> +MODULE_PARM_DESC(cert_slot, "Slot number of the certificate requested for constructing the SPDM session");
> +
> +module_param_named(physfn, tsm.physfn, bool, 0644);
> +MODULE_PARM_DESC(physfn, "Allow TDI on SR IOV of a physical function");

No. Lets build proper uAPI for these. These TSM global parameters are
what I envisioned hanging off of the global TSM class device.

[..]
> +/*
> + * Enables IDE between the RC and the device.
> + * TEE Limited, IDE Cfg space and other bits are hardcoded
> + * as this is a sketch.

It would help to know how in depth to review the pieces if there were
more pointers of "this is serious proposal", and "this is a sketch".

> + */
> +static int tsm_set_sel_ide(struct tsm_dev *tdev)

I find the "sel" abbreviation too short to be useful. Perhaps lets just
call "Selective IDE" "ide" and "Link IDE" "link_ide". Since "Selective"
is the common case.

> +{
> +	struct pci_dev *rootport;
> +	bool printed = false;
> +	unsigned int i;
> +	int ret = 0;
> +
> +	rootport = tdev->pdev->bus->self;

Does this assume no intervening IDE switches?

> +	for (i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
> +		if (!tdev->selective_ide[i].enable)
> +			continue;
> +
> +		if (!printed) {
> +			pci_info(rootport, "Configuring IDE with %s\n",
> +				 pci_name(tdev->pdev));
> +			printed = true;

Why so chatty? Just make if pci_dbg() and be done.

> +		}
> +		WARN_ON_ONCE(tdev->selective_ide[i].enabled);

Crash the kernel if IDE is already enabled??

> +
> +		ret = pci_ide_set_sel_rid_assoc(tdev->pdev, i, true, 0, 0, 0xFFFF);
> +		if (ret)
> +			pci_warn(tdev->pdev,
> +				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
> +				 i, ret);
> +		ret = pci_ide_set_sel_addr_assoc(tdev->pdev, i, 0/* RID# */, true,
> +						 0, 0xFFFFFFFFFFF00000ULL);
> +		if (ret)
> +			pci_warn(tdev->pdev,
> +				 "Failed configuring SelectiveIDE#%d RID#0 with %d\n",
> +				 i, ret);
> +
> +		ret = pci_ide_set_sel(tdev->pdev, i,
> +				      tdev->selective_ide[i].id,
> +				      tdev->selective_ide[i].enable,
> +				      tdev->selective_ide[i].def,
> +				      tdev->selective_ide[i].dev_tee_limited,
> +				      tdev->selective_ide[i].dev_ide_cfg);

This feels kludgy. IDE is a fundamental mechanism of a PCI device why
would a PCI core helper not know how to extract the settings from a
pdev?

Something like:

pci_ide_setup_stream(pdev, i)

> +		if (ret) {
> +			pci_warn(tdev->pdev,
> +				 "Failed configuring SelectiveIDE#%d with %d\n",
> +				 i, ret);
> +			break;
> +		}
> +
> +		ret = pci_ide_set_sel_rid_assoc(rootport, i, true, 0, 0, 0xFFFF);
> +		if (ret)
> +			pci_warn(rootport,
> +				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
> +				 i, ret);
> +
> +		ret = pci_ide_set_sel(rootport, i,

Perhaps:

pci_ide_host_setup_stream(pdev, i)

...I expect the helper should be able to figure out the rootport and RID
association.

> +				      tdev->selective_ide[i].id,
> +				      tdev->selective_ide[i].enable,
> +				      tdev->selective_ide[i].def,
> +				      tdev->selective_ide[i].rootport_tee_limited,
> +				      tdev->selective_ide[i].rootport_ide_cfg);
> +		if (ret)
> +			pci_warn(rootport,
> +				 "Failed configuring SelectiveIDE#%d with %d\n",
> +				 i, ret);
> +
> +		tdev->selective_ide[i].enabled = 1;
> +	}
> +
> +	return ret;
> +}
> +
> +static void tsm_unset_sel_ide(struct tsm_dev *tdev)
> +{
> +	struct pci_dev *rootport = tdev->pdev->bus->self;
> +	bool printed = false;
> +
> +	for (unsigned int i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
> +		if (!tdev->selective_ide[i].enabled)
> +			continue;
> +
> +		if (!printed) {
> +			pci_info(rootport, "Deconfiguring IDE with %s\n", pci_name(tdev->pdev));
> +			printed = true;
> +		}
> +
> +		pci_ide_set_sel(rootport, i, 0, 0, 0, false, false);
> +		pci_ide_set_sel(tdev->pdev, i, 0, 0, 0, false, false);

These calls are unreadable, how about:

pci_ide_host_destroy_stream(pdev, i)
pci_ide_destroy_stream(pdev, i)


> +static int tsm_dev_connect(struct tsm_dev *tdev, void *private_data, unsigned int val)
> +{
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->dev_connect))
> +		return -EPERM;

How does a device get this far into the flow with a TSM that does not
define the "connect" verb?

> +
> +	tdev->ide_pre = val == 2;
> +	if (tdev->ide_pre)
> +		tsm_set_sel_ide(tdev);
> +
> +	mutex_lock(&tdev->spdm_mutex);
> +	while (1) {
> +		ret = tsm.ops->dev_connect(tdev, tsm.private_data);
> +		if (ret <= 0)
> +			break;
> +
> +		ret = spdm_forward(&tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdev->spdm_mutex);
> +
> +	if (!tdev->ide_pre)
> +		ret = tsm_set_sel_ide(tdev);
> +
> +	tdev->connected = (ret == 0);
> +
> +	return ret;
> +}
> +
> +static int tsm_dev_reclaim(struct tsm_dev *tdev, void *private_data)
> +{
> +	struct pci_dev *pdev = NULL;
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->dev_reclaim))
> +		return -EPERM;

Similar comment about how this could happen and why crashing the kernel
is ok.

> +
> +	/* Do not disconnect with active TDIs */
> +	for_each_pci_dev(pdev) {
> +		struct tsm_tdi *tdi = tsm_tdi_get(&pdev->dev);
> +
> +		if (tdi && tdi->tdev == tdev && tdi->data)
> +			return -EBUSY;

I would expect that removing things out of order causes violence, not
blocking it.

For example you can remove disk drivers while filesystems are still
mounted. What is the administrator's recourse if they *do* want to
shutdown the TSM layer all at once?

> +	}
> +
> +	if (!tdev->ide_pre)
> +		tsm_unset_sel_ide(tdev);
> +
> +	mutex_lock(&tdev->spdm_mutex);
> +	while (1) {
> +		ret = tsm.ops->dev_reclaim(tdev, private_data);
> +		if (ret <= 0)
> +			break;

What is the "reclaim" verb? Is this just a destructor? Does "disconnect"
not sufficiently clean up the device context?

> +
> +		ret = spdm_forward(&tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdev->spdm_mutex);
> +
> +	if (tdev->ide_pre)
> +		tsm_unset_sel_ide(tdev);
> +
> +	if (!ret)
> +		tdev->connected = false;
> +
> +	return ret;
> +}
> +
> +static int tsm_dev_status(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s)
> +{
> +	if (WARN_ON(!tsm.ops->dev_status))
> +		return -EPERM;
> +
> +	return tsm.ops->dev_status(tdev, private_data, s);

This is asking for better defined semantics.

> +}
> +
> +static int tsm_ide_refresh(struct tsm_dev *tdev, void *private_data)
> +{
> +	int ret;
> +
> +	if (!tsm.ops->ide_refresh)
> +		return -EPERM;
> +
> +	mutex_lock(&tdev->spdm_mutex);
> +	while (1) {
> +		ret = tsm.ops->ide_refresh(tdev, private_data);
> +		if (ret <= 0)
> +			break;

Why is refresh not "connect"? I.e. connecting an already connected
device refreshes the connection.

> +
> +		ret = spdm_forward(&tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdev->spdm_mutex);
> +
> +	return ret;
> +}
> +
> +static void tsm_tdi_reclaim(struct tsm_tdi *tdi, void *private_data)
> +{
> +	int ret;
> +
> +	if (WARN_ON(!tsm.ops->tdi_reclaim))
> +		return;
> +
> +	mutex_lock(&tdi->tdev->spdm_mutex);
> +	while (1) {
> +		ret = tsm.ops->tdi_reclaim(tdi, private_data);
> +		if (ret <= 0)
> +			break;

What is involved in tdi "reclaim" separately from "unbind"?
"dev_reclaim" and "tdi_reclaim" seem less precise than "disconnect" and
"unbind".

> +
> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
> +		if (ret < 0)
> +			break;
> +	}
> +	mutex_unlock(&tdi->tdev->spdm_mutex);
> +}
> +
> +static int tsm_tdi_validate(struct tsm_tdi *tdi, bool invalidate, void *private_data)
> +{
> +	int ret;
> +
> +	if (!tdi || !tsm.ops->tdi_validate)
> +		return -EPERM;
> +
> +	ret = tsm.ops->tdi_validate(tdi, invalidate, private_data);
> +	if (ret) {
> +		pci_err(tdi->pdev, "Validation failed, ret=%d", ret);
> +		tdi->pdev->dev.tdi_enabled = false;
> +	}
> +
> +	return ret;
> +}
> +
> +/* In case BUS_NOTIFY_PCI_BUS_MASTER is no good, a driver can call pci_dev_tdi_validate() */

No. TDISP is a fundamental re-imagining of the PCI device security
model. It deserves first class support in the PCI core, not bolted on
support via bus notifiers.

[..]

I hesitate to keep commenting because this is so far off of the lifetime
and code organization expectations I thought we were negotiating with
the PCI/TSM series. So I will stop here for now.
Alexey Kardashevskiy Sept. 4, 2024, 11:13 a.m. UTC | #15
On 4/9/24 09:51, Dan Williams wrote:
> Alexey Kardashevskiy wrote:
>> The module responsibilities are:
>> 1. detect TEE support in a device and create nodes in the device's sysfs
>> entry;
>> 2. allow binding a PCI device to a VM for passing it through in a trusted
>> manner;
>> 3. store measurements/certificates/reports and provide access to those for
>> the userspace via sysfs.
>>
>> This relies on the platform to register a set of callbacks,
>> for both host and guest.
>>
>> And tdi_enabled in the device struct.
> 
> I had been holding out hope that when I got this patch the changelog
> would give some justification for what folks had been whispering to me
> in recent days: "hey Dan, looks like Alexey is completely ignoring the
> PCI/TSM approach?".
> 
> Bjorn acked that approach here:
> 
> http://lore.kernel.org/20240419220729.GA307280@bhelgaas
> 
> It is in need of a refresh, preview here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux.git/commit/?id=5807465b92ac
> 
> At best, I am disappointed that this RFC ignored it. More comments
> below, but please do clarify if we are working together on a Bjorn-acked
> direction, or not.

Together.

My problem with that patchset is that it only does connect/disconnect 
and no TDISP business (and I need both for my exercise) and I was hoping 
to see some TDISP-aware git tree but this has not happened yet so I 
postponed rebasing onto it, due to the lack of time and also apparent 
difference between yours and mine TSMs (and I had mine working before I 
saw yours and focused on making things work for the starter). Sorry, I 
should have spoken louder. Or listen better to that whispering. Or 
rebase earlier.


> 
>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> ---
>>   drivers/virt/coco/Makefile      |    1 +
>>   include/linux/device.h          |    5 +
>>   include/linux/tsm.h             |  263 ++++
>>   drivers/virt/coco/tsm.c         | 1336 ++++++++++++++++++++
>>   Documentation/virt/coco/tsm.rst |   62 +
>>   drivers/virt/coco/Kconfig       |   11 +
>>   6 files changed, 1678 insertions(+)
>>
>> diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
>> index 75defec514f8..5d1aefb62714 100644
>> --- a/drivers/virt/coco/Makefile
>> +++ b/drivers/virt/coco/Makefile
>> @@ -3,6 +3,7 @@
>>   # Confidential computing related collateral
>>   #
>>   obj-$(CONFIG_TSM_REPORTS)	+= tsm-report.o
>> +obj-$(CONFIG_TSM)		+= tsm.o
> 
> The expectation is that something like drivers/virt/coco/tsm.c would be
> the class driver for cross-vendor generic TSM uAPI. The PCI specific
> bits go in drivers/pci/tsm.c.
> 
>>   obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
>>   obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
>>   obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 34eb20f5966f..bb58ed1fb8da 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -45,6 +45,7 @@ struct fwnode_handle;
>>   struct iommu_group;
>>   struct dev_pin_info;
>>   struct dev_iommu;
>> +struct tsm_tdi;
>>   struct msi_device_data;
>>   
>>   /**
>> @@ -801,6 +802,7 @@ struct device {
>>   	void	(*release)(struct device *dev);
>>   	struct iommu_group	*iommu_group;
>>   	struct dev_iommu	*iommu;
>> +	struct tsm_tdi		*tdi;
> 
> No. The only known device model for TDIs is PCI devices, i.e. TDISP is a
> PCI protocol. Even SPDM which is cross device-type generic did not touch
> 'struct device'.

TDISP is PCI but DMA is not. This is for:
[RFC PATCH 19/21] sev-guest: Stop changing encrypted page state for 
TDISP devices

DMA layer deals with struct device and tries hard to avoid indirect _ops 
calls so I was looking for a place for "tdi_enabled" (a bad name, 
perhaps, may be call it "dma_encrypted", a few lines below). So I keep 
the flag and the pointer together for the RFC. I am hoping for a better 
solution for 19/21, then I am absolutely moving tdi* to pci_dev (well, 
drop these and just use yours).


>>   	struct device_physical_location *physical_location;
>>   
>> @@ -822,6 +824,9 @@ struct device {
>>   #ifdef CONFIG_DMA_NEED_SYNC
>>   	bool			dma_skip_sync:1;
>>   #endif
>> +#if defined(CONFIG_TSM) || defined(CONFIG_TSM_MODULE)
>> +	bool			tdi_enabled:1;
>> +#endif
>>   };
>>   
>>   /**
>> diff --git a/include/linux/tsm.h b/include/linux/tsm.h
>> new file mode 100644
>> index 000000000000..d48eceaf5bc0
>> --- /dev/null
>> +++ b/include/linux/tsm.h
>> @@ -0,0 +1,263 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef LINUX_TSM_H
>> +#define LINUX_TSM_H
>> +
>> +#include <linux/cdev.h>
>> +
>> +/* SPDM control structure for DOE */
>> +struct tsm_spdm {
>> +	unsigned long req_len;
>> +	void *req;
>> +	unsigned long rsp_len;
>> +	void *rsp;
>> +
>> +	struct pci_doe_mb *doe_mb;
>> +	struct pci_doe_mb *doe_mb_secured;
>> +};
>> +
>> +/* Data object for measurements/certificates/attestationreport */
>> +struct tsm_blob {
>> +	void *data;
>> +	size_t len;
>> +	struct kref kref;
>> +	void (*release)(struct tsm_blob *b);
> 
> Hard to judge the suitability of these without documentation. Skeptical
> that these TDISP evidence blobs need to have a lifetime distinct from
> the device's TSM context.

You are right.

>> +};
>> +
>> +struct tsm_blob *tsm_blob_new(void *data, size_t len, void (*release)(struct tsm_blob *b));
>> +struct tsm_blob *tsm_blob_get(struct tsm_blob *b);
>> +void tsm_blob_put(struct tsm_blob *b);
>> +
>> +/**
>> + * struct tdisp_interface_id - TDISP INTERFACE_ID Definition
>> + *
>> + * @function_id: Identifies the function of the device hosting the TDI
>> + * 15:0: @rid: Requester ID
>> + * 23:16: @rseg: Requester Segment (Reserved if Requester Segment Valid is Clear)
>> + * 24: @rseg_valid: Requester Segment Valid
>> + * 31:25 – Reserved
>> + * 8B - Reserved
>> + */
>> +struct tdisp_interface_id {
>> +	union {
>> +		struct {
>> +			u32 function_id;
>> +			u8 reserved[8];
>> +		};
>> +		struct {
>> +			u16 rid;
>> +			u8 rseg;
>> +			u8 rseg_valid:1;
> 
> Linux typically avoids C-bitfields in hardware interfaces in favor of
> bitfield.h macros.
> >> +		};
>> +	};
>> +} __packed;
> 
> Does this need to be "packed"? Looks naturally aligned to pahole.

"__packed" is also a way to say it is a binary interface, I want to be 
precise about this.


>> +
>> +/*
>> + * Measurement block as defined in SPDM DSP0274.
>> + */
>> +struct spdm_measurement_block_header {
>> +	u8 index;
>> +	u8 spec; /* MeasurementSpecification */
>> +	u16 size;
>> +} __packed;
>> +
>> +struct dmtf_measurement_block_header {
>> +	u8 type;  /* DMTFSpecMeasurementValueType */
>> +	u16 size; /* DMTFSpecMeasurementValueSize */
>> +} __packed;
> 
> This one might need to be packed.
> 
>> +
>> +struct dmtf_measurement_block_device_mode {
>> +	u32 opmode_cap;	 /* OperationalModeCapabilties */
>> +	u32 opmode_sta;  /* OperationalModeState */
>> +	u32 devmode_cap; /* DeviceModeCapabilties */
>> +	u32 devmode_sta; /* DeviceModeState */
>> +} __packed;
>> +
>> +struct spdm_certchain_block_header {
>> +	u16 length;
>> +	u16 reserved;
>> +} __packed;
> 
> These last 2 struct do not seem to need it.
> 
>> +
>> +/*
>> + * TDI Report Structure as defined in TDISP.
>> + */
>> +struct tdi_report_header {
>> +	union {
>> +		u16 interface_info;
>> +		struct {
>> +			u16 no_fw_update:1; /* fw updates not permitted in CONFIG_LOCKED or RUN */
>> +			u16 dma_no_pasid:1; /* TDI generates DMA requests without PASID */
>> +			u16 dma_pasid:1; /* TDI generates DMA requests with PASID */
>> +			u16 ats:1; /*  ATS supported and enabled for the TDI */
>> +			u16 prs:1; /*  PRS supported and enabled for the TDI */
>> +			u16 reserved1:11;
>> +		};
> 
> Same C-bitfield comment, as before, and what about big endian hosts?

Right, I'll get rid of c-bitfields in the common parts.

Although I am curious what big-endian platform is going to actually 
support this.


>> +	};
>> +	u16 reserved2;
>> +	u16 msi_x_message_control;
>> +	u16 lnr_control;
>> +	u32 tph_control;
>> +	u32 mmio_range_count;
>> +} __packed;
>> +
>> +/*
>> + * Each MMIO Range of the TDI is reported with the MMIO reporting offset added.
>> + * Base and size in units of 4K pages
>> + */
>> +struct tdi_report_mmio_range {
>> +	u64 first_page; /* First 4K page with offset added */
>> +	u32 num; 	/* Number of 4K pages in this range */
>> +	union {
>> +		u32 range_attributes;
>> +		struct {
>> +			u32 msix_table:1;
>> +			u32 msix_pba:1;
>> +			u32 is_non_tee_mem:1;
>> +			u32 is_mem_attr_updatable:1;
>> +			u32 reserved:12;
>> +			u32 range_id:16;
>> +		};
>> +	};
>> +} __packed;
>> +
>> +struct tdi_report_footer {
>> +	u32 device_specific_info_len;
>> +	u8 device_specific_info[];
>> +} __packed;
>> +
>> +#define TDI_REPORT_HDR(rep)		((struct tdi_report_header *) ((rep)->data))
>> +#define TDI_REPORT_MR_NUM(rep)		(TDI_REPORT_HDR(rep)->mmio_range_count)
>> +#define TDI_REPORT_MR_OFF(rep)		((struct tdi_report_mmio_range *) (TDI_REPORT_HDR(rep) + 1))
>> +#define TDI_REPORT_MR(rep, rangeid)	TDI_REPORT_MR_OFF(rep)[rangeid]
>> +#define TDI_REPORT_FTR(rep)		((struct tdi_report_footer *) &TDI_REPORT_MR((rep), \
>> +					TDI_REPORT_MR_NUM(rep)))
>> +
>> +/* Physical device descriptor responsible for IDE/TDISP setup */
>> +struct tsm_dev {
>> +	struct kref kref;
> 
> Another kref that begs the question why would a tsm_dev need its own
> lifetime? This also goes back to the organization in the PCI/TSM
> proposal that all TSM objects are at max bound to the lifetime of
> whatever is shorter, the registration of the low-level TSM driver or the
> PCI device itself.


That proposal deals with PFs for now and skips TDIs. Since TDI needs its 
place in pci_dev too, and I wanted to add the bare minimum to struct 
device or pci_dev, I only add TDIs and each of them references a DEV. 
Enough to get me going.


>> +	const struct attribute_group *ag;
> 
> PCI device attribute groups are already conveyed in a well known
> (lifetime and user visibility) manner. What is motivating this
> "re-imagining"?
> 
>> +	struct pci_dev *pdev; /* Physical PCI function #0 */
>> +	struct tsm_spdm spdm;
>> +	struct mutex spdm_mutex;
> 
> Is an spdm lock sufficient? I expect the device needs to serialize all
> TSM communications, not just spdm? Documentation of the locking would
> help.

What other communication do you mean here?


>> +
>> +	u8 tc_mask;
>> +	u8 cert_slot;
>> +	u8 connected;
>> +	struct {
>> +		u8 enabled:1;
>> +		u8 enable:1;
>> +		u8 def:1;
>> +		u8 dev_ide_cfg:1;
>> +		u8 dev_tee_limited:1;
>> +		u8 rootport_ide_cfg:1;
>> +		u8 rootport_tee_limited:1;
>> +		u8 id;
>> +	} selective_ide[256];
>> +	bool ide_pre;
>> +
>> +	struct tsm_blob *meas;
>> +	struct tsm_blob *certs;
> 
> Compare these to the blobs that Lukas that maintains for CMA. To my
> knoweldge no new kref lifetime rules independent of the authenticated
> lifetime.
> 
>> +
>> +	void *data; /* Platform specific data */
>> +};
>> +
>> +/* PCI function for passing through, can be the same as tsm_dev::pdev */
>> +struct tsm_tdi {
>> +	const struct attribute_group *ag;
>> +	struct pci_dev *pdev;
>> +	struct tsm_dev *tdev;
>> +
>> +	u8 rseg;
>> +	u8 rseg_valid;
>> +	bool validated;
>> +
>> +	struct tsm_blob *report;
>> +
>> +	void *data; /* Platform specific data */
>> +
>> +	u64 vmid;
>> +	u32 asid;
>> +	u16 guest_rid; /* BDFn of PCI Fn in the VM */
>> +};
>> +
>> +struct tsm_dev_status {
>> +	u8 ctx_state;
>> +	u8 tc_mask;
>> +	u8 certs_slot;
>> +	u16 device_id;
>> +	u16 segment_id;
>> +	u8 no_fw_update;
>> +	u16 ide_stream_id[8];
>> +};
>> +
>> +enum tsm_spdm_algos {
>> +	TSM_TDI_SPDM_ALGOS_DHE_SECP256R1,
>> +	TSM_TDI_SPDM_ALGOS_DHE_SECP384R1,
>> +	TSM_TDI_SPDM_ALGOS_AEAD_AES_128_GCM,
>> +	TSM_TDI_SPDM_ALGOS_AEAD_AES_256_GCM,
>> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_RSASSA_3072,
>> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P256,
>> +	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P384,
>> +	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_256,
>> +	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_384,
>> +	TSM_TDI_SPDM_ALGOS_KEY_SCHED_SPDM_KEY_SCHEDULE,
>> +};
>> +
>> +enum tsm_tdisp_state {
>> +	TDISP_STATE_UNAVAIL,
>> +	TDISP_STATE_CONFIG_UNLOCKED,
>> +	TDISP_STATE_CONFIG_LOCKED,
>> +	TDISP_STATE_RUN,
>> +	TDISP_STATE_ERROR,
>> +};
>> +
>> +struct tsm_tdi_status {
>> +	bool valid;
>> +	u8 meas_digest_fresh:1;
>> +	u8 meas_digest_valid:1;
>> +	u8 all_request_redirect:1;
>> +	u8 bind_p2p:1;
>> +	u8 lock_msix:1;
>> +	u8 no_fw_update:1;
>> +	u16 cache_line_size;
>> +	u64 spdm_algos; /* Bitmask of tsm_spdm_algos */
>> +	u8 certs_digest[48];
>> +	u8 meas_digest[48];
>> +	u8 interface_report_digest[48];
>> +
>> +	/* HV only */
>> +	struct tdisp_interface_id id;
>> +	u8 guest_report_id[16];
>> +	enum tsm_tdisp_state state;
>> +};
>> +
>> +struct tsm_ops {
> 
> The lack of documentation for these ops makes review difficult.
> 
>> +	/* HV hooks */
>> +	int (*dev_connect)(struct tsm_dev *tdev, void *private_data);
> 
> Lets not abandon type-safety this early. Is it really the case that all
> of these helpers need anonymous globs passed to them?
> >> +	int (*dev_reclaim)(struct tsm_dev *tdev, void *private_data);
>> +	int (*dev_status)(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s);
>> +	int (*ide_refresh)(struct tsm_dev *tdev, void *private_data);
> 
> IDE Key Refresh seems an enhancement worth breaking out of the base
> enabling.
> 
>> +	int (*tdi_bind)(struct tsm_tdi *tdi, u32 bdfn, u64 vmid, u32 asid, void *private_data);
>> +	int (*tdi_reclaim)(struct tsm_tdi *tdi, void *private_data);
>> +
>> +	int (*guest_request)(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, void *req_data,
>> +			     enum tsm_tdisp_state *state, void *private_data);
>> +
>> +	/* VM hooks */
>> +	int (*tdi_validate)(struct tsm_tdi *tdi, bool invalidate, void *private_data);
>> +
>> +	/* HV and VM hooks */
>> +	int (*tdi_status)(struct tsm_tdi *tdi, void *private_data, struct tsm_tdi_status *ts);
> 
> Lets not mix HV and VM hooks in the same ops without good reason.
> 
>> +};
>> +
>> +void tsm_set_ops(struct tsm_ops *ops, void *private_data);
>> +struct tsm_tdi *tsm_tdi_get(struct device *dev);
>> +int tsm_tdi_bind(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, u32 asid);
>> +void tsm_tdi_unbind(struct tsm_tdi *tdi);
>> +int tsm_guest_request(struct tsm_tdi *tdi, enum tsm_tdisp_state *state, void *req_data);
>> +struct tsm_tdi *tsm_tdi_find(u32 guest_rid, u64 vmid);
>> +
>> +int pci_dev_tdi_validate(struct pci_dev *pdev);
>> +ssize_t tsm_report_gen(struct tsm_blob *report, char *b, size_t len);
>> +
>> +#endif /* LINUX_TSM_H */
>> diff --git a/drivers/virt/coco/tsm.c b/drivers/virt/coco/tsm.c
>> new file mode 100644
>> index 000000000000..e90455a0267f
>> --- /dev/null
>> +++ b/drivers/virt/coco/tsm.c
>> @@ -0,0 +1,1336 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +#include <linux/module.h>
>> +#include <linux/pci.h>
>> +#include <linux/pci-doe.h>
>> +#include <linux/pci-ide.h>
>> +#include <linux/file.h>
>> +#include <linux/fdtable.h>
>> +#include <linux/tsm.h>
>> +#include <linux/kvm_host.h>
>> +
>> +#define DRIVER_VERSION	"0.1"
>> +#define DRIVER_AUTHOR	"aik@amd.com"
>> +#define DRIVER_DESC	"TSM TDISP driver"
>> +
>> +static struct {
>> +	struct tsm_ops *ops;
>> +	void *private_data;
>> +
>> +	uint tc_mask;
>> +	uint cert_slot;
>> +	bool physfn;
>> +} tsm;
>> +
>> +module_param_named(tc_mask, tsm.tc_mask, uint, 0644);
>> +MODULE_PARM_DESC(tc_mask, "Mask of traffic classes enabled in the device");
>> +
>> +module_param_named(cert_slot, tsm.cert_slot, uint, 0644);
>> +MODULE_PARM_DESC(cert_slot, "Slot number of the certificate requested for constructing the SPDM session");
>> +
>> +module_param_named(physfn, tsm.physfn, bool, 0644);
>> +MODULE_PARM_DESC(physfn, "Allow TDI on SR IOV of a physical function");
> 
> No. Lets build proper uAPI for these. These TSM global parameters are
> what I envisioned hanging off of the global TSM class device.
> 
> [..]
>> +/*
>> + * Enables IDE between the RC and the device.
>> + * TEE Limited, IDE Cfg space and other bits are hardcoded
>> + * as this is a sketch.
> 
> It would help to know how in depth to review the pieces if there were
> more pointers of "this is serious proposal", and "this is a sketch".

Largely the latter, remember to keep appreciating the "release early" 
aspect of it :)

It is a sketch which has been tested on the hardware with both KVM and 
SNP VM which (I thought) has some value if posted before the LPC. I 
should have made it clearer though.


>> + */
>> +static int tsm_set_sel_ide(struct tsm_dev *tdev)
> 
> I find the "sel" abbreviation too short to be useful. Perhaps lets just
> call "Selective IDE" "ide" and "Link IDE" "link_ide". Since "Selective"
> is the common case.
> 
>> +{
>> +	struct pci_dev *rootport;
>> +	bool printed = false;
>> +	unsigned int i;
>> +	int ret = 0;
>> +
>> +	rootport = tdev->pdev->bus->self;
> 
> Does this assume no intervening IDE switches?
> 
>> +	for (i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
>> +		if (!tdev->selective_ide[i].enable)
>> +			continue;
>> +
>> +		if (!printed) {
>> +			pci_info(rootport, "Configuring IDE with %s\n",
>> +				 pci_name(tdev->pdev));
>> +			printed = true;
> 
> Why so chatty? Just make if pci_dbg() and be done.
> 
>> +		}
>> +		WARN_ON_ONCE(tdev->selective_ide[i].enabled);
> 
> Crash the kernel if IDE is already enabled??
> 
>> +
>> +		ret = pci_ide_set_sel_rid_assoc(tdev->pdev, i, true, 0, 0, 0xFFFF);
>> +		if (ret)
>> +			pci_warn(tdev->pdev,
>> +				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
>> +				 i, ret);
>> +		ret = pci_ide_set_sel_addr_assoc(tdev->pdev, i, 0/* RID# */, true,
>> +						 0, 0xFFFFFFFFFFF00000ULL);
>> +		if (ret)
>> +			pci_warn(tdev->pdev,
>> +				 "Failed configuring SelectiveIDE#%d RID#0 with %d\n",
>> +				 i, ret);
>> +
>> +		ret = pci_ide_set_sel(tdev->pdev, i,
>> +				      tdev->selective_ide[i].id,
>> +				      tdev->selective_ide[i].enable,
>> +				      tdev->selective_ide[i].def,
>> +				      tdev->selective_ide[i].dev_tee_limited,
>> +				      tdev->selective_ide[i].dev_ide_cfg);
> 
> This feels kludgy. IDE is a fundamental mechanism of a PCI device why
> would a PCI core helper not know how to extract the settings from a
> pdev?
> 
> Something like:
> 
> pci_ide_setup_stream(pdev, i)


It is unclear to me how we go about what stream(s) need(s) enabling and 
what flags to set. Who decides - a driver? a daemon/user?

> 
>> +		if (ret) {
>> +			pci_warn(tdev->pdev,
>> +				 "Failed configuring SelectiveIDE#%d with %d\n",
>> +				 i, ret);
>> +			break;
>> +		}
>> +
>> +		ret = pci_ide_set_sel_rid_assoc(rootport, i, true, 0, 0, 0xFFFF);
>> +		if (ret)
>> +			pci_warn(rootport,
>> +				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
>> +				 i, ret);
>> +
>> +		ret = pci_ide_set_sel(rootport, i,
> 
> Perhaps:
> 
> pci_ide_host_setup_stream(pdev, i)
> 
> ...I expect the helper should be able to figure out the rootport and RID
> association.

Where will the helper get the properties from?


>> +				      tdev->selective_ide[i].id,
>> +				      tdev->selective_ide[i].enable,
>> +				      tdev->selective_ide[i].def,
>> +				      tdev->selective_ide[i].rootport_tee_limited,
>> +				      tdev->selective_ide[i].rootport_ide_cfg);
>> +		if (ret)
>> +			pci_warn(rootport,
>> +				 "Failed configuring SelectiveIDE#%d with %d\n",
>> +				 i, ret);
>> +
>> +		tdev->selective_ide[i].enabled = 1;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void tsm_unset_sel_ide(struct tsm_dev *tdev)
>> +{
>> +	struct pci_dev *rootport = tdev->pdev->bus->self;
>> +	bool printed = false;
>> +
>> +	for (unsigned int i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
>> +		if (!tdev->selective_ide[i].enabled)
>> +			continue;
>> +
>> +		if (!printed) {
>> +			pci_info(rootport, "Deconfiguring IDE with %s\n", pci_name(tdev->pdev));
>> +			printed = true;
>> +		}
>> +
>> +		pci_ide_set_sel(rootport, i, 0, 0, 0, false, false);
>> +		pci_ide_set_sel(tdev->pdev, i, 0, 0, 0, false, false);
> 
> These calls are unreadable, how about:
> 
> pci_ide_host_destroy_stream(pdev, i)
> pci_ide_destroy_stream(pdev, i)
> 
> 
>> +static int tsm_dev_connect(struct tsm_dev *tdev, void *private_data, unsigned int val)
>> +{
>> +	int ret;
>> +
>> +	if (WARN_ON(!tsm.ops->dev_connect))
>> +		return -EPERM;
> 
> How does a device get this far into the flow with a TSM that does not
> define the "connect" verb?
> 
>> +
>> +	tdev->ide_pre = val == 2;
>> +	if (tdev->ide_pre)
>> +		tsm_set_sel_ide(tdev);
>> +
>> +	mutex_lock(&tdev->spdm_mutex);
>> +	while (1) {
>> +		ret = tsm.ops->dev_connect(tdev, tsm.private_data);
>> +		if (ret <= 0)
>> +			break;
>> +
>> +		ret = spdm_forward(&tdev->spdm, ret);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +	mutex_unlock(&tdev->spdm_mutex);
>> +
>> +	if (!tdev->ide_pre)
>> +		ret = tsm_set_sel_ide(tdev);
>> +
>> +	tdev->connected = (ret == 0);
>> +
>> +	return ret;
>> +}
>> +
>> +static int tsm_dev_reclaim(struct tsm_dev *tdev, void *private_data)
>> +{
>> +	struct pci_dev *pdev = NULL;
>> +	int ret;
>> +
>> +	if (WARN_ON(!tsm.ops->dev_reclaim))
>> +		return -EPERM;
> 
> Similar comment about how this could happen and why crashing the kernel
> is ok.

In this exercise, connect/reclaim are triggered via sysfs so this can 
happen in my practice.

And it is WARN_ON, not BUG_ON, is it still called "crashing" (vs. 
"panic", I never closely thought about it)?


> 
>> +
>> +	/* Do not disconnect with active TDIs */
>> +	for_each_pci_dev(pdev) {
>> +		struct tsm_tdi *tdi = tsm_tdi_get(&pdev->dev);
>> +
>> +		if (tdi && tdi->tdev == tdev && tdi->data)
>> +			return -EBUSY;
> 
> I would expect that removing things out of order causes violence, not
> blocking it.
> 
> For example you can remove disk drivers while filesystems are still
> mounted. What is the administrator's recourse if they *do* want to
> shutdown the TSM layer all at once?

"rmmod tsm"

>> +	}
>> +
>> +	if (!tdev->ide_pre)
>> +		tsm_unset_sel_ide(tdev);
>> +
>> +	mutex_lock(&tdev->spdm_mutex);
>> +	while (1) {
>> +		ret = tsm.ops->dev_reclaim(tdev, private_data);
>> +		if (ret <= 0)
>> +			break;
> 
> What is the "reclaim" verb? Is this just a destructor? Does "disconnect"
> not sufficiently clean up the device context?
 >
>> +
>> +		ret = spdm_forward(&tdev->spdm, ret);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +	mutex_unlock(&tdev->spdm_mutex);
>> +
>> +	if (tdev->ide_pre)
>> +		tsm_unset_sel_ide(tdev);
>> +
>> +	if (!ret)
>> +		tdev->connected = false;
>> +
>> +	return ret;
>> +}
>> +
>> +static int tsm_dev_status(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s)
>> +{
>> +	if (WARN_ON(!tsm.ops->dev_status))
>> +		return -EPERM;
>> +
>> +	return tsm.ops->dev_status(tdev, private_data, s);
> 
> This is asking for better defined semantics.
> 
>> +}
>> +
>> +static int tsm_ide_refresh(struct tsm_dev *tdev, void *private_data)
>> +{
>> +	int ret;
>> +
>> +	if (!tsm.ops->ide_refresh)
>> +		return -EPERM;
>> +
>> +	mutex_lock(&tdev->spdm_mutex);
>> +	while (1) {
>> +		ret = tsm.ops->ide_refresh(tdev, private_data);
>> +		if (ret <= 0)
>> +			break;
> 
> Why is refresh not "connect"? I.e. connecting an already connected
> device refreshes the connection.

Really not sure about that. Either way I am ditching it for now.

>> +
>> +		ret = spdm_forward(&tdev->spdm, ret);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +	mutex_unlock(&tdev->spdm_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +static void tsm_tdi_reclaim(struct tsm_tdi *tdi, void *private_data)
>> +{
>> +	int ret;
>> +
>> +	if (WARN_ON(!tsm.ops->tdi_reclaim))
>> +		return;
>> +
>> +	mutex_lock(&tdi->tdev->spdm_mutex);
>> +	while (1) {
>> +		ret = tsm.ops->tdi_reclaim(tdi, private_data);
>> +		if (ret <= 0)
>> +			break;
> 
> What is involved in tdi "reclaim" separately from "unbind"?
> "dev_reclaim" and "tdi_reclaim" seem less precise than "disconnect" and
> "unbind".

The firmware operates at the finer granularity so there are 
create+connect+disconnect+reclaim (for DEV and TDI). My verbs dictionary 
evolved from having all of them in the tsm_ops to this subset which 
tells the state the verb leaves the device at. This needs correction, yes.


>> +
>> +		ret = spdm_forward(&tdi->tdev->spdm, ret);
>> +		if (ret < 0)
>> +			break;
>> +	}
>> +	mutex_unlock(&tdi->tdev->spdm_mutex);
>> +}
>> +
>> +static int tsm_tdi_validate(struct tsm_tdi *tdi, bool invalidate, void *private_data)
>> +{
>> +	int ret;
>> +
>> +	if (!tdi || !tsm.ops->tdi_validate)
>> +		return -EPERM;
>> +
>> +	ret = tsm.ops->tdi_validate(tdi, invalidate, private_data);
>> +	if (ret) {
>> +		pci_err(tdi->pdev, "Validation failed, ret=%d", ret);
>> +		tdi->pdev->dev.tdi_enabled = false;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +/* In case BUS_NOTIFY_PCI_BUS_MASTER is no good, a driver can call pci_dev_tdi_validate() */
> 
> No. TDISP is a fundamental re-imagining of the PCI device security
> model. It deserves first class support in the PCI core, not bolted on
> support via bus notifiers.

This one is about sequencing. For example, writing a zero to BME breaks 
a TDI after it moved to CONFIG_LOCKED. So, we either:
1) prevent zeroing BME or
2) delay this "validation" step (which also needs a better name).

If 1), then I can call "validate" from the PCI core before the driver's 
probe.
If 2), it is either a driver modification to call "validate" explicitly 
or have a notifier like this. Or guest's sysfs - as a VM might want to 
boot with a "shared" device, get to the userspace where some daemon 
inspects the certificates/etc and "validates" the device only if it is 
happy with the result. There may be even some vendor-specific device 
configuration happening before the validation step.

> 
> [..]
> 
> I hesitate to keep commenting because this is so far off of the lifetime
> and code organization expectations I thought we were negotiating with
> the PCI/TSM series. So I will stop here for now.

Good call, sorry for the mess. Thanks for the review!


ps: I'll just fix the things I did not comment on but I'm not ignoring them.
Dan Williams Sept. 4, 2024, 11:28 p.m. UTC | #16
Alexey Kardashevskiy wrote:
> 
> 
> On 4/9/24 09:51, Dan Williams wrote:
> > Alexey Kardashevskiy wrote:
> >> The module responsibilities are:
> >> 1. detect TEE support in a device and create nodes in the device's sysfs
> >> entry;
> >> 2. allow binding a PCI device to a VM for passing it through in a trusted
> >> manner;
> >> 3. store measurements/certificates/reports and provide access to those for
> >> the userspace via sysfs.
> >>
> >> This relies on the platform to register a set of callbacks,
> >> for both host and guest.
> >>
> >> And tdi_enabled in the device struct.
> > 
> > I had been holding out hope that when I got this patch the changelog
> > would give some justification for what folks had been whispering to me
> > in recent days: "hey Dan, looks like Alexey is completely ignoring the
> > PCI/TSM approach?".
> > 
> > Bjorn acked that approach here:
> > 
> > http://lore.kernel.org/20240419220729.GA307280@bhelgaas
> > 
> > It is in need of a refresh, preview here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux.git/commit/?id=5807465b92ac
> > 
> > At best, I am disappointed that this RFC ignored it. More comments
> > below, but please do clarify if we are working together on a Bjorn-acked
> > direction, or not.
> 
> Together.
> 
> My problem with that patchset is that it only does connect/disconnect 
> and no TDISP business (and I need both for my exercise) and I was hoping 
> to see some TDISP-aware git tree but this has not happened yet so I 
> postponed rebasing onto it, due to the lack of time and also apparent 
> difference between yours and mine TSMs (and I had mine working before I 
> saw yours and focused on making things work for the starter). Sorry, I 
> should have spoken louder. Or listen better to that whispering. Or 
> rebase earlier.

Ok, this makes sense. This is definitely changelog material to clarify
assumptions, tradeoffs, and direction. The fact that the changelog said
nothing about those was, at a minimum, cause for concern.

[..]
> >> @@ -801,6 +802,7 @@ struct device {
> >>   	void	(*release)(struct device *dev);
> >>   	struct iommu_group	*iommu_group;
> >>   	struct dev_iommu	*iommu;
> >> +	struct tsm_tdi		*tdi;
> > 
> > No. The only known device model for TDIs is PCI devices, i.e. TDISP is a
> > PCI protocol. Even SPDM which is cross device-type generic did not touch
> > 'struct device'.
> 
> TDISP is PCI but DMA is not. This is for:
> [RFC PATCH 19/21] sev-guest: Stop changing encrypted page state for 
> TDISP devices
> 
> DMA layer deals with struct device and tries hard to avoid indirect _ops 
> calls so I was looking for a place for "tdi_enabled" (a bad name, 
> perhaps, may be call it "dma_encrypted", a few lines below).

The name and the fact that it exposes all of the TSM interfaces to the
driver core made it unclear if this oversharing was on purpose, or for
convenience / expediency?

I agree that 'struct device' should carry DMA mapping details, but the
full TDI context is so much more than that which makes it difficult to
understand the organizing principle of this data sharing.

> the flag and the pointer together for the RFC. I am hoping for a better 
> solution for 19/21, then I am absolutely moving tdi* to pci_dev (well, 
> drop these and just use yours).

Ok, so what patches are in the category of "temporary hacks to get
something going and a plan to replace them", and which are "firm
proposals looking for review feedback"?

[..]
> >> +/**
> >> + * struct tdisp_interface_id - TDISP INTERFACE_ID Definition
> >> + *
> >> + * @function_id: Identifies the function of the device hosting the TDI
> >> + * 15:0: @rid: Requester ID
> >> + * 23:16: @rseg: Requester Segment (Reserved if Requester Segment Valid is Clear)
> >> + * 24: @rseg_valid: Requester Segment Valid
> >> + * 31:25 – Reserved
> >> + * 8B - Reserved
> >> + */
> >> +struct tdisp_interface_id {
> >> +	union {
> >> +		struct {
> >> +			u32 function_id;
> >> +			u8 reserved[8];
> >> +		};
> >> +		struct {
> >> +			u16 rid;
> >> +			u8 rseg;
> >> +			u8 rseg_valid:1;
> > 
> > Linux typically avoids C-bitfields in hardware interfaces in favor of
> > bitfield.h macros.
> > >> +		};
> >> +	};
> >> +} __packed;
> > 
> > Does this need to be "packed"? Looks naturally aligned to pahole.
> 
> "__packed" is also a way to say it is a binary interface, I want to be 
> precise about this.

It's also a way to tell the compiler to turn off useful optimizations.

Don't these also need to be __le32 and __le16 for the multi-byte fields?

[..]
> > Same C-bitfield comment, as before, and what about big endian hosts?
> 
> Right, I'll get rid of c-bitfields in the common parts.
> 
> Although I am curious what big-endian platform is going to actually 
> support this.

The PCI DOE and CMA code is cross-CPU generic with endian annotations
where needed. Why would PCI TSM code get away with kicking that analysis
down the road?

[..]
> >> +/* Physical device descriptor responsible for IDE/TDISP setup */
> >> +struct tsm_dev {
> >> +	struct kref kref;
> > 
> > Another kref that begs the question why would a tsm_dev need its own
> > lifetime? This also goes back to the organization in the PCI/TSM
> > proposal that all TSM objects are at max bound to the lifetime of
> > whatever is shorter, the registration of the low-level TSM driver or the
> > PCI device itself.
> 
> 
> That proposal deals with PFs for now and skips TDIs. Since TDI needs its 
> place in pci_dev too, and I wanted to add the bare minimum to struct 
> device or pci_dev, I only add TDIs and each of them references a DEV. 
> Enough to get me going.

Fine for an RFC, but again please be upfront about what is firmer for
deeper scrutiny and what is softer to get the RFC standing.

> >> +	const struct attribute_group *ag;
> > 
> > PCI device attribute groups are already conveyed in a well known
> > (lifetime and user visibility) manner. What is motivating this
> > "re-imagining"?
> > 
> >> +	struct pci_dev *pdev; /* Physical PCI function #0 */
> >> +	struct tsm_spdm spdm;
> >> +	struct mutex spdm_mutex;
> > 
> > Is an spdm lock sufficient? I expect the device needs to serialize all
> > TSM communications, not just spdm? Documentation of the locking would
> > help.
> 
> What other communication do you mean here?

For example, a lock protecting entry into tsm_ops->connect(...), if that
operation is locked does there need to be a lower level spdm locking
context?

[..]
> >> +/*
> >> + * Enables IDE between the RC and the device.
> >> + * TEE Limited, IDE Cfg space and other bits are hardcoded
> >> + * as this is a sketch.
> > 
> > It would help to know how in depth to review the pieces if there were
> > more pointers of "this is serious proposal", and "this is a sketch".
> 
> Largely the latter, remember to keep appreciating the "release early" 
> aspect of it :)
> 
> It is a sketch which has been tested on the hardware with both KVM and 
> SNP VM which (I thought) has some value if posted before the LPC. I 
> should have made it clearer though.

It is definitely useful for getting the conversation started, but maybe
we need a SubmittingPatches style document that clarifies that RFC's
need to be explicit about if and where reviewers spend their time.

[..]
> > This feels kludgy. IDE is a fundamental mechanism of a PCI device why
> > would a PCI core helper not know how to extract the settings from a
> > pdev?
> > 
> > Something like:
> > 
> > pci_ide_setup_stream(pdev, i)
> 
> 
> It is unclear to me how we go about what stream(s) need(s) enabling and 
> what flags to set. Who decides - a driver? a daemon/user?

That is a good topic for the design document that Jason wanted. I had
been expecting that since stream IDs are a limited resource the kernel
needs to depend on userspace to handle allocation conflicts. Most of the
other settings would seem to be PCI core defaults unless and until
someone can point to a use case for a driver or userspace to have a
different opinion about those settings.

> >> +		if (ret) {
> >> +			pci_warn(tdev->pdev,
> >> +				 "Failed configuring SelectiveIDE#%d with %d\n",
> >> +				 i, ret);
> >> +			break;
> >> +		}
> >> +
> >> +		ret = pci_ide_set_sel_rid_assoc(rootport, i, true, 0, 0, 0xFFFF);
> >> +		if (ret)
> >> +			pci_warn(rootport,
> >> +				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
> >> +				 i, ret);
> >> +
> >> +		ret = pci_ide_set_sel(rootport, i,
> > 
> > Perhaps:
> > 
> > pci_ide_host_setup_stream(pdev, i)
> > 
> > ...I expect the helper should be able to figure out the rootport and RID
> > association.
> 
> Where will the helper get the properties from?

I expect it can retrieve it out of @pdev since the IDE settings belong
in 'struct pci_dev'.

[..]
> >> +static int tsm_dev_reclaim(struct tsm_dev *tdev, void *private_data)
> >> +{
> >> +	struct pci_dev *pdev = NULL;
> >> +	int ret;
> >> +
> >> +	if (WARN_ON(!tsm.ops->dev_reclaim))
> >> +		return -EPERM;
> > 
> > Similar comment about how this could happen and why crashing the kernel
> > is ok.
> 
> In this exercise, connect/reclaim are triggered via sysfs so this can 
> happen in my practice.
> 
> And it is WARN_ON, not BUG_ON, is it still called "crashing" (vs. 
> "panic", I never closely thought about it)?

You will see folks like Greg raise the concern that many users run with
"panic_on_warn" enabled. I expect a confidential VM is well advised to
enable that.

If it is a "can't ever happen outside of a kernel developer mistake"
then maybe WARN_ON() is ok, and you will see folks like Christoph assert
that WARN_ON() is good for that, but it should be reserved for cases
where rebooting might be a good idea if it fires.

> >> +
> >> +	/* Do not disconnect with active TDIs */
> >> +	for_each_pci_dev(pdev) {
> >> +		struct tsm_tdi *tdi = tsm_tdi_get(&pdev->dev);
> >> +
> >> +		if (tdi && tdi->tdev == tdev && tdi->data)
> >> +			return -EBUSY;
> > 
> > I would expect that removing things out of order causes violence, not
> > blocking it.
> > 
> > For example you can remove disk drivers while filesystems are still
> > mounted. What is the administrator's recourse if they *do* want to
> > shutdown the TSM layer all at once?
> 
> "rmmod tsm"

Is tsm_dev_reclaim() triggered by "rmmod tsm"? The concern is how to
reclaim when tsm_dev_reclaim() is sometimes returning EBUSY. Similar to
how the driver core enforces that driver unbind must succeed so should
TSM shutdown.

Also, the proposal Bjorn acked, because it comports with PCI sysfs
lifetime and visibility expectations, is that the TSM core is part of
the PCI core, just like DOE and CMA. The proposed way to shutdown TSM
operations is to unbind the low level TSM driver (TIO, TDX-Connect,
etc...) and that will forcefully destruct all TDI contexts with no
dangling -EBUSY cases.

Maybe tsm_dev_reclaim() is not triggered by TSM shutdown, but TSM
shutdown, like 'struct device_driver'.remove() should return 'void'.
Note, I know that 'struct device_driver' is not quite there yet on
->remove() returning 'void' instead of 'int', but that is the direction.

[..]
> > Why is refresh not "connect"? I.e. connecting an already connected
> > device refreshes the connection.
> 
> Really not sure about that. Either way I am ditching it for now.

Yeah, lets aggressively defer incremental features.

> >> +		ret = spdm_forward(&tdev->spdm, ret);
> >> +		if (ret < 0)
> >> +			break;
> >> +	}
> >> +	mutex_unlock(&tdev->spdm_mutex);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static void tsm_tdi_reclaim(struct tsm_tdi *tdi, void *private_data)
> >> +{
> >> +	int ret;
> >> +
> >> +	if (WARN_ON(!tsm.ops->tdi_reclaim))
> >> +		return;
> >> +
> >> +	mutex_lock(&tdi->tdev->spdm_mutex);
> >> +	while (1) {
> >> +		ret = tsm.ops->tdi_reclaim(tdi, private_data);
> >> +		if (ret <= 0)
> >> +			break;
> > 
> > What is involved in tdi "reclaim" separately from "unbind"?
> > "dev_reclaim" and "tdi_reclaim" seem less precise than "disconnect" and
> > "unbind".
> 
> The firmware operates at the finer granularity so there are 
> create+connect+disconnect+reclaim (for DEV and TDI). My verbs dictionary 
> evolved from having all of them in the tsm_ops to this subset which 
> tells the state the verb leaves the device at. This needs correction, yes.

I like the simplicity of the TIO verbs, but that does not preclude the
Linux verbs from having even coarser semantics.

[..]
> >> +/* In case BUS_NOTIFY_PCI_BUS_MASTER is no good, a driver can call pci_dev_tdi_validate() */
> > 
> > No. TDISP is a fundamental re-imagining of the PCI device security
> > model. It deserves first class support in the PCI core, not bolted on
> > support via bus notifiers.
> 
> This one is about sequencing. For example, writing a zero to BME breaks 
> a TDI after it moved to CONFIG_LOCKED. So, we either:
> 1) prevent zeroing BME or
> 2) delay this "validation" step (which also needs a better name).
> 
> If 1), then I can call "validate" from the PCI core before the driver's 
> probe.
> If 2), it is either a driver modification to call "validate" explicitly 
> or have a notifier like this. Or guest's sysfs - as a VM might want to 
> boot with a "shared" device, get to the userspace where some daemon 
> inspects the certificates/etc and "validates" the device only if it is 
> happy with the result. There may be even some vendor-specific device 
> configuration happening before the validation step.

Right, the guest might need to operate the device in shared mode to get
it ready for validation. At that point locking and validating the device
needs to be triggered by userspace talking to the PCI core before
reloading the driver to operate the device in private mode. That
conversion is probably best modeled as a hotplug event to leave the
shared world and enter the secured world.

That likely means that the userspace operation to transtion the device
to LOCKED also needs to take care of enabling BME and MSE independent of
any driver just based on the interface report. Then, loading the driver
can take the device from LOCKED to RUN when ready.

Yes, that implies an enlightened driver, for simplicity. We could later 
think about auto-validating devices by pre-loading golden measurements
into the kernel, but I expect the common case is that userspace needs to
do a bunch of work with the device-evidence and the verifier to get
itself comfortable with allowing the device to transition to the RUN
state.

> > I hesitate to keep commenting because this is so far off of the lifetime
> > and code organization expectations I thought we were negotiating with
> > the PCI/TSM series. So I will stop here for now.
> 
> Good call, sorry for the mess. Thanks for the review!

No harm done. The code is useful and the disconnect on the communication
/ documentation is now understood.

> ps: I'll just fix the things I did not comment on but I'm not ignoring them.

Sounds good.
diff mbox series

Patch

diff --git a/drivers/virt/coco/Makefile b/drivers/virt/coco/Makefile
index 75defec514f8..5d1aefb62714 100644
--- a/drivers/virt/coco/Makefile
+++ b/drivers/virt/coco/Makefile
@@ -3,6 +3,7 @@ 
 # Confidential computing related collateral
 #
 obj-$(CONFIG_TSM_REPORTS)	+= tsm-report.o
+obj-$(CONFIG_TSM)		+= tsm.o
 obj-$(CONFIG_EFI_SECRET)	+= efi_secret/
 obj-$(CONFIG_SEV_GUEST)		+= sev-guest/
 obj-$(CONFIG_INTEL_TDX_GUEST)	+= tdx-guest/
diff --git a/include/linux/device.h b/include/linux/device.h
index 34eb20f5966f..bb58ed1fb8da 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -45,6 +45,7 @@  struct fwnode_handle;
 struct iommu_group;
 struct dev_pin_info;
 struct dev_iommu;
+struct tsm_tdi;
 struct msi_device_data;
 
 /**
@@ -801,6 +802,7 @@  struct device {
 	void	(*release)(struct device *dev);
 	struct iommu_group	*iommu_group;
 	struct dev_iommu	*iommu;
+	struct tsm_tdi		*tdi;
 
 	struct device_physical_location *physical_location;
 
@@ -822,6 +824,9 @@  struct device {
 #ifdef CONFIG_DMA_NEED_SYNC
 	bool			dma_skip_sync:1;
 #endif
+#if defined(CONFIG_TSM) || defined(CONFIG_TSM_MODULE)
+	bool			tdi_enabled:1;
+#endif
 };
 
 /**
diff --git a/include/linux/tsm.h b/include/linux/tsm.h
new file mode 100644
index 000000000000..d48eceaf5bc0
--- /dev/null
+++ b/include/linux/tsm.h
@@ -0,0 +1,263 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef LINUX_TSM_H
+#define LINUX_TSM_H
+
+#include <linux/cdev.h>
+
+/* SPDM control structure for DOE */
+struct tsm_spdm {
+	unsigned long req_len;
+	void *req;
+	unsigned long rsp_len;
+	void *rsp;
+
+	struct pci_doe_mb *doe_mb;
+	struct pci_doe_mb *doe_mb_secured;
+};
+
+/* Data object for measurements/certificates/attestationreport */
+struct tsm_blob {
+	void *data;
+	size_t len;
+	struct kref kref;
+	void (*release)(struct tsm_blob *b);
+};
+
+struct tsm_blob *tsm_blob_new(void *data, size_t len, void (*release)(struct tsm_blob *b));
+struct tsm_blob *tsm_blob_get(struct tsm_blob *b);
+void tsm_blob_put(struct tsm_blob *b);
+
+/**
+ * struct tdisp_interface_id - TDISP INTERFACE_ID Definition
+ *
+ * @function_id: Identifies the function of the device hosting the TDI
+ * 15:0: @rid: Requester ID
+ * 23:16: @rseg: Requester Segment (Reserved if Requester Segment Valid is Clear)
+ * 24: @rseg_valid: Requester Segment Valid
+ * 31:25 – Reserved
+ * 8B - Reserved
+ */
+struct tdisp_interface_id {
+	union {
+		struct {
+			u32 function_id;
+			u8 reserved[8];
+		};
+		struct {
+			u16 rid;
+			u8 rseg;
+			u8 rseg_valid:1;
+		};
+	};
+} __packed;
+
+/*
+ * Measurement block as defined in SPDM DSP0274.
+ */
+struct spdm_measurement_block_header {
+	u8 index;
+	u8 spec; /* MeasurementSpecification */
+	u16 size;
+} __packed;
+
+struct dmtf_measurement_block_header {
+	u8 type;  /* DMTFSpecMeasurementValueType */
+	u16 size; /* DMTFSpecMeasurementValueSize */
+} __packed;
+
+struct dmtf_measurement_block_device_mode {
+	u32 opmode_cap;	 /* OperationalModeCapabilties */
+	u32 opmode_sta;  /* OperationalModeState */
+	u32 devmode_cap; /* DeviceModeCapabilties */
+	u32 devmode_sta; /* DeviceModeState */
+} __packed;
+
+struct spdm_certchain_block_header {
+	u16 length;
+	u16 reserved;
+} __packed;
+
+/*
+ * TDI Report Structure as defined in TDISP.
+ */
+struct tdi_report_header {
+	union {
+		u16 interface_info;
+		struct {
+			u16 no_fw_update:1; /* fw updates not permitted in CONFIG_LOCKED or RUN */
+			u16 dma_no_pasid:1; /* TDI generates DMA requests without PASID */
+			u16 dma_pasid:1; /* TDI generates DMA requests with PASID */
+			u16 ats:1; /*  ATS supported and enabled for the TDI */
+			u16 prs:1; /*  PRS supported and enabled for the TDI */
+			u16 reserved1:11;
+		};
+	};
+	u16 reserved2;
+	u16 msi_x_message_control;
+	u16 lnr_control;
+	u32 tph_control;
+	u32 mmio_range_count;
+} __packed;
+
+/*
+ * Each MMIO Range of the TDI is reported with the MMIO reporting offset added.
+ * Base and size in units of 4K pages
+ */
+struct tdi_report_mmio_range {
+	u64 first_page; /* First 4K page with offset added */
+	u32 num; 	/* Number of 4K pages in this range */
+	union {
+		u32 range_attributes;
+		struct {
+			u32 msix_table:1;
+			u32 msix_pba:1;
+			u32 is_non_tee_mem:1;
+			u32 is_mem_attr_updatable:1;
+			u32 reserved:12;
+			u32 range_id:16;
+		};
+	};
+} __packed;
+
+struct tdi_report_footer {
+	u32 device_specific_info_len;
+	u8 device_specific_info[];
+} __packed;
+
+#define TDI_REPORT_HDR(rep)		((struct tdi_report_header *) ((rep)->data))
+#define TDI_REPORT_MR_NUM(rep)		(TDI_REPORT_HDR(rep)->mmio_range_count)
+#define TDI_REPORT_MR_OFF(rep)		((struct tdi_report_mmio_range *) (TDI_REPORT_HDR(rep) + 1))
+#define TDI_REPORT_MR(rep, rangeid)	TDI_REPORT_MR_OFF(rep)[rangeid]
+#define TDI_REPORT_FTR(rep)		((struct tdi_report_footer *) &TDI_REPORT_MR((rep), \
+					TDI_REPORT_MR_NUM(rep)))
+
+/* Physical device descriptor responsible for IDE/TDISP setup */
+struct tsm_dev {
+	struct kref kref;
+	const struct attribute_group *ag;
+	struct pci_dev *pdev; /* Physical PCI function #0 */
+	struct tsm_spdm spdm;
+	struct mutex spdm_mutex;
+
+	u8 tc_mask;
+	u8 cert_slot;
+	u8 connected;
+	struct {
+		u8 enabled:1;
+		u8 enable:1;
+		u8 def:1;
+		u8 dev_ide_cfg:1;
+		u8 dev_tee_limited:1;
+		u8 rootport_ide_cfg:1;
+		u8 rootport_tee_limited:1;
+		u8 id;
+	} selective_ide[256];
+	bool ide_pre;
+
+	struct tsm_blob *meas;
+	struct tsm_blob *certs;
+
+	void *data; /* Platform specific data */
+};
+
+/* PCI function for passing through, can be the same as tsm_dev::pdev */
+struct tsm_tdi {
+	const struct attribute_group *ag;
+	struct pci_dev *pdev;
+	struct tsm_dev *tdev;
+
+	u8 rseg;
+	u8 rseg_valid;
+	bool validated;
+
+	struct tsm_blob *report;
+
+	void *data; /* Platform specific data */
+
+	u64 vmid;
+	u32 asid;
+	u16 guest_rid; /* BDFn of PCI Fn in the VM */
+};
+
+struct tsm_dev_status {
+	u8 ctx_state;
+	u8 tc_mask;
+	u8 certs_slot;
+	u16 device_id;
+	u16 segment_id;
+	u8 no_fw_update;
+	u16 ide_stream_id[8];
+};
+
+enum tsm_spdm_algos {
+	TSM_TDI_SPDM_ALGOS_DHE_SECP256R1,
+	TSM_TDI_SPDM_ALGOS_DHE_SECP384R1,
+	TSM_TDI_SPDM_ALGOS_AEAD_AES_128_GCM,
+	TSM_TDI_SPDM_ALGOS_AEAD_AES_256_GCM,
+	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_RSASSA_3072,
+	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P256,
+	TSM_TDI_SPDM_ALGOS_ASYM_TPM_ALG_ECDSA_ECC_NIST_P384,
+	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_256,
+	TSM_TDI_SPDM_ALGOS_HASH_TPM_ALG_SHA_384,
+	TSM_TDI_SPDM_ALGOS_KEY_SCHED_SPDM_KEY_SCHEDULE,
+};
+
+enum tsm_tdisp_state {
+	TDISP_STATE_UNAVAIL,
+	TDISP_STATE_CONFIG_UNLOCKED,
+	TDISP_STATE_CONFIG_LOCKED,
+	TDISP_STATE_RUN,
+	TDISP_STATE_ERROR,
+};
+
+struct tsm_tdi_status {
+	bool valid;
+	u8 meas_digest_fresh:1;
+	u8 meas_digest_valid:1;
+	u8 all_request_redirect:1;
+	u8 bind_p2p:1;
+	u8 lock_msix:1;
+	u8 no_fw_update:1;
+	u16 cache_line_size;
+	u64 spdm_algos; /* Bitmask of tsm_spdm_algos */
+	u8 certs_digest[48];
+	u8 meas_digest[48];
+	u8 interface_report_digest[48];
+
+	/* HV only */
+	struct tdisp_interface_id id;
+	u8 guest_report_id[16];
+	enum tsm_tdisp_state state;
+};
+
+struct tsm_ops {
+	/* HV hooks */
+	int (*dev_connect)(struct tsm_dev *tdev, void *private_data);
+	int (*dev_reclaim)(struct tsm_dev *tdev, void *private_data);
+	int (*dev_status)(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s);
+	int (*ide_refresh)(struct tsm_dev *tdev, void *private_data);
+	int (*tdi_bind)(struct tsm_tdi *tdi, u32 bdfn, u64 vmid, u32 asid, void *private_data);
+	int (*tdi_reclaim)(struct tsm_tdi *tdi, void *private_data);
+
+	int (*guest_request)(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, void *req_data,
+			     enum tsm_tdisp_state *state, void *private_data);
+
+	/* VM hooks */
+	int (*tdi_validate)(struct tsm_tdi *tdi, bool invalidate, void *private_data);
+
+	/* HV and VM hooks */
+	int (*tdi_status)(struct tsm_tdi *tdi, void *private_data, struct tsm_tdi_status *ts);
+};
+
+void tsm_set_ops(struct tsm_ops *ops, void *private_data);
+struct tsm_tdi *tsm_tdi_get(struct device *dev);
+int tsm_tdi_bind(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, u32 asid);
+void tsm_tdi_unbind(struct tsm_tdi *tdi);
+int tsm_guest_request(struct tsm_tdi *tdi, enum tsm_tdisp_state *state, void *req_data);
+struct tsm_tdi *tsm_tdi_find(u32 guest_rid, u64 vmid);
+
+int pci_dev_tdi_validate(struct pci_dev *pdev);
+ssize_t tsm_report_gen(struct tsm_blob *report, char *b, size_t len);
+
+#endif /* LINUX_TSM_H */
diff --git a/drivers/virt/coco/tsm.c b/drivers/virt/coco/tsm.c
new file mode 100644
index 000000000000..e90455a0267f
--- /dev/null
+++ b/drivers/virt/coco/tsm.c
@@ -0,0 +1,1336 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci-doe.h>
+#include <linux/pci-ide.h>
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/tsm.h>
+#include <linux/kvm_host.h>
+
+#define DRIVER_VERSION	"0.1"
+#define DRIVER_AUTHOR	"aik@amd.com"
+#define DRIVER_DESC	"TSM TDISP driver"
+
+static struct {
+	struct tsm_ops *ops;
+	void *private_data;
+
+	uint tc_mask;
+	uint cert_slot;
+	bool physfn;
+} tsm;
+
+module_param_named(tc_mask, tsm.tc_mask, uint, 0644);
+MODULE_PARM_DESC(tc_mask, "Mask of traffic classes enabled in the device");
+
+module_param_named(cert_slot, tsm.cert_slot, uint, 0644);
+MODULE_PARM_DESC(cert_slot, "Slot number of the certificate requested for constructing the SPDM session");
+
+module_param_named(physfn, tsm.physfn, bool, 0644);
+MODULE_PARM_DESC(physfn, "Allow TDI on SR IOV of a physical function");
+
+struct tsm_blob *tsm_blob_new(void *data, size_t len, void (*release)(struct tsm_blob *b))
+{
+	struct tsm_blob *b;
+
+	if (!len || !data)
+		return NULL;
+
+	b = kzalloc(sizeof(*b) + len, GFP_KERNEL);
+	if (!b)
+		return NULL;
+
+	b->data = (void *)b + sizeof(*b);
+	b->len = len;
+	b->release = release;
+	memcpy(b->data, data, len);
+	kref_init(&b->kref);
+
+	return b;
+}
+EXPORT_SYMBOL_GPL(tsm_blob_new);
+
+static void tsm_blob_release(struct kref *kref)
+{
+	struct tsm_blob *b = container_of(kref, struct tsm_blob, kref);
+
+	b->release(b);
+	kfree(b);
+}
+
+struct tsm_blob *tsm_blob_get(struct tsm_blob *b)
+{
+	if (!b)
+		return NULL;
+
+	if (!kref_get_unless_zero(&b->kref))
+		return NULL;
+
+	return b;
+}
+EXPORT_SYMBOL_GPL(tsm_blob_get);
+
+void tsm_blob_put(struct tsm_blob *b)
+{
+	if (!b)
+		return;
+
+	kref_put(&b->kref, tsm_blob_release);
+}
+EXPORT_SYMBOL_GPL(tsm_blob_put);
+
+static struct tsm_dev *tsm_dev_get(struct device *dev)
+{
+	struct tsm_tdi *tdi = dev->tdi;
+
+	if (!tdi || !tdi->tdev || !kref_get_unless_zero(&tdi->tdev->kref))
+		return NULL;
+
+	return tdi->tdev;
+}
+
+static void tsm_dev_free(struct kref *kref);
+static void tsm_dev_put(struct tsm_dev *tdev)
+{
+	kref_put(&tdev->kref, tsm_dev_free);
+}
+
+struct tsm_tdi *tsm_tdi_get(struct device *dev)
+{
+	struct tsm_tdi *tdi = dev->tdi;
+
+	return tdi;
+}
+EXPORT_SYMBOL_GPL(tsm_tdi_get);
+
+static int spdm_forward(struct tsm_spdm *spdm, u8 type)
+{
+	struct pci_doe_mb *doe_mb;
+	int rc;
+
+	if (type == PCI_DOE_PROTOCOL_SECURED_CMA_SPDM)
+		doe_mb = spdm->doe_mb_secured;
+	else if (type == PCI_DOE_PROTOCOL_CMA_SPDM)
+		doe_mb = spdm->doe_mb;
+	else
+		return -EINVAL;
+
+	if (!doe_mb)
+		return -EFAULT;
+
+	rc = pci_doe(doe_mb, PCI_VENDOR_ID_PCI_SIG, type,
+		     spdm->req, spdm->req_len, spdm->rsp, spdm->rsp_len);
+	if (rc >= 0)
+		spdm->rsp_len = rc;
+
+	return rc;
+}
+
+/*
+ * Enables IDE between the RC and the device.
+ * TEE Limited, IDE Cfg space and other bits are hardcoded
+ * as this is a sketch.
+ */
+static int tsm_set_sel_ide(struct tsm_dev *tdev)
+{
+	struct pci_dev *rootport;
+	bool printed = false;
+	unsigned int i;
+	int ret = 0;
+
+	rootport = tdev->pdev->bus->self;
+	for (i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
+		if (!tdev->selective_ide[i].enable)
+			continue;
+
+		if (!printed) {
+			pci_info(rootport, "Configuring IDE with %s\n",
+				 pci_name(tdev->pdev));
+			printed = true;
+		}
+		WARN_ON_ONCE(tdev->selective_ide[i].enabled);
+
+		ret = pci_ide_set_sel_rid_assoc(tdev->pdev, i, true, 0, 0, 0xFFFF);
+		if (ret)
+			pci_warn(tdev->pdev,
+				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
+				 i, ret);
+		ret = pci_ide_set_sel_addr_assoc(tdev->pdev, i, 0/* RID# */, true,
+						 0, 0xFFFFFFFFFFF00000ULL);
+		if (ret)
+			pci_warn(tdev->pdev,
+				 "Failed configuring SelectiveIDE#%d RID#0 with %d\n",
+				 i, ret);
+
+		ret = pci_ide_set_sel(tdev->pdev, i,
+				      tdev->selective_ide[i].id,
+				      tdev->selective_ide[i].enable,
+				      tdev->selective_ide[i].def,
+				      tdev->selective_ide[i].dev_tee_limited,
+				      tdev->selective_ide[i].dev_ide_cfg);
+		if (ret) {
+			pci_warn(tdev->pdev,
+				 "Failed configuring SelectiveIDE#%d with %d\n",
+				 i, ret);
+			break;
+		}
+
+		ret = pci_ide_set_sel_rid_assoc(rootport, i, true, 0, 0, 0xFFFF);
+		if (ret)
+			pci_warn(rootport,
+				 "Failed configuring SelectiveIDE#%d rid1 with %d\n",
+				 i, ret);
+
+		ret = pci_ide_set_sel(rootport, i,
+				      tdev->selective_ide[i].id,
+				      tdev->selective_ide[i].enable,
+				      tdev->selective_ide[i].def,
+				      tdev->selective_ide[i].rootport_tee_limited,
+				      tdev->selective_ide[i].rootport_ide_cfg);
+		if (ret)
+			pci_warn(rootport,
+				 "Failed configuring SelectiveIDE#%d with %d\n",
+				 i, ret);
+
+		tdev->selective_ide[i].enabled = 1;
+	}
+
+	return ret;
+}
+
+static void tsm_unset_sel_ide(struct tsm_dev *tdev)
+{
+	struct pci_dev *rootport = tdev->pdev->bus->self;
+	bool printed = false;
+
+	for (unsigned int i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
+		if (!tdev->selective_ide[i].enabled)
+			continue;
+
+		if (!printed) {
+			pci_info(rootport, "Deconfiguring IDE with %s\n", pci_name(tdev->pdev));
+			printed = true;
+		}
+
+		pci_ide_set_sel(rootport, i, 0, 0, 0, false, false);
+		pci_ide_set_sel(tdev->pdev, i, 0, 0, 0, false, false);
+		tdev->selective_ide[i].enabled = 0;
+	}
+}
+
+static int tsm_dev_connect(struct tsm_dev *tdev, void *private_data, unsigned int val)
+{
+	int ret;
+
+	if (WARN_ON(!tsm.ops->dev_connect))
+		return -EPERM;
+
+	tdev->ide_pre = val == 2;
+	if (tdev->ide_pre)
+		tsm_set_sel_ide(tdev);
+
+	mutex_lock(&tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->dev_connect(tdev, tsm.private_data);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdev->spdm_mutex);
+
+	if (!tdev->ide_pre)
+		ret = tsm_set_sel_ide(tdev);
+
+	tdev->connected = (ret == 0);
+
+	return ret;
+}
+
+static int tsm_dev_reclaim(struct tsm_dev *tdev, void *private_data)
+{
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	if (WARN_ON(!tsm.ops->dev_reclaim))
+		return -EPERM;
+
+	/* Do not disconnect with active TDIs */
+	for_each_pci_dev(pdev) {
+		struct tsm_tdi *tdi = tsm_tdi_get(&pdev->dev);
+
+		if (tdi && tdi->tdev == tdev && tdi->data)
+			return -EBUSY;
+	}
+
+	if (!tdev->ide_pre)
+		tsm_unset_sel_ide(tdev);
+
+	mutex_lock(&tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->dev_reclaim(tdev, private_data);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdev->spdm_mutex);
+
+	if (tdev->ide_pre)
+		tsm_unset_sel_ide(tdev);
+
+	if (!ret)
+		tdev->connected = false;
+
+	return ret;
+}
+
+static int tsm_dev_status(struct tsm_dev *tdev, void *private_data, struct tsm_dev_status *s)
+{
+	if (WARN_ON(!tsm.ops->dev_status))
+		return -EPERM;
+
+	return tsm.ops->dev_status(tdev, private_data, s);
+}
+
+static int tsm_ide_refresh(struct tsm_dev *tdev, void *private_data)
+{
+	int ret;
+
+	if (!tsm.ops->ide_refresh)
+		return -EPERM;
+
+	mutex_lock(&tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->ide_refresh(tdev, private_data);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdev->spdm_mutex);
+
+	return ret;
+}
+
+static void tsm_tdi_reclaim(struct tsm_tdi *tdi, void *private_data)
+{
+	int ret;
+
+	if (WARN_ON(!tsm.ops->tdi_reclaim))
+		return;
+
+	mutex_lock(&tdi->tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->tdi_reclaim(tdi, private_data);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdi->tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdi->tdev->spdm_mutex);
+}
+
+static int tsm_tdi_validate(struct tsm_tdi *tdi, bool invalidate, void *private_data)
+{
+	int ret;
+
+	if (!tdi || !tsm.ops->tdi_validate)
+		return -EPERM;
+
+	ret = tsm.ops->tdi_validate(tdi, invalidate, private_data);
+	if (ret) {
+		pci_err(tdi->pdev, "Validation failed, ret=%d", ret);
+		tdi->pdev->dev.tdi_enabled = false;
+	}
+
+	return ret;
+}
+
+/* In case BUS_NOTIFY_PCI_BUS_MASTER is no good, a driver can call pci_dev_tdi_validate() */
+int pci_dev_tdi_validate(struct pci_dev *pdev)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(&pdev->dev);
+
+	return tsm_tdi_validate(tdi, false, tsm.private_data);
+}
+EXPORT_SYMBOL_GPL(pci_dev_tdi_validate);
+
+static int tsm_tdi_status(struct tsm_tdi *tdi, void *private_data, struct tsm_tdi_status *ts)
+{
+	struct tsm_tdi_status tstmp = { 0 };
+	int ret;
+
+	if (WARN_ON(!tsm.ops->tdi_status))
+		return -EPERM;
+
+	mutex_lock(&tdi->tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->tdi_status(tdi, private_data, &tstmp);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdi->tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdi->tdev->spdm_mutex);
+
+	*ts = tstmp;
+
+	return ret;
+}
+
+static ssize_t tsm_cert_slot_store(struct device *dev, struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t ret = count;
+	unsigned long val;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		ret = -EINVAL;
+	else
+		tdev->cert_slot = val;
+
+	tsm_dev_put(tdev);
+
+	return ret;
+}
+
+static ssize_t tsm_cert_slot_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t ret = sysfs_emit(buf, "%u\n", tdev->cert_slot);
+
+	tsm_dev_put(tdev);
+	return ret;
+}
+
+static DEVICE_ATTR_RW(tsm_cert_slot);
+
+static ssize_t tsm_tc_mask_store(struct device *dev, struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t ret = count;
+	unsigned long val;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		ret = -EINVAL;
+	else
+		tdev->tc_mask = val;
+	tsm_dev_put(tdev);
+
+	return ret;
+}
+
+static ssize_t tsm_tc_mask_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t ret = sysfs_emit(buf, "%#x\n", tdev->tc_mask);
+
+	tsm_dev_put(tdev);
+	return ret;
+}
+
+static DEVICE_ATTR_RW(tsm_tc_mask);
+
+static ssize_t tsm_dev_connect_store(struct device *dev, struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	unsigned long val;
+	ssize_t ret = -EIO;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		ret = -EINVAL;
+	else if (val && !tdev->connected)
+		ret = tsm_dev_connect(tdev, tsm.private_data, val);
+	else if (!val && tdev->connected)
+		ret = tsm_dev_reclaim(tdev, tsm.private_data);
+
+	if (!ret)
+		ret = count;
+
+	tsm_dev_put(tdev);
+
+	return ret;
+}
+
+static ssize_t tsm_dev_connect_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t ret = sysfs_emit(buf, "%u\n", tdev->connected);
+
+	tsm_dev_put(tdev);
+	return ret;
+}
+
+static DEVICE_ATTR_RW(tsm_dev_connect);
+
+static ssize_t tsm_sel_stream_store(struct device *dev, struct device_attribute *attr,
+				    const char *buf, size_t count)
+{
+	unsigned int ide_dev = false, tee_dev = true, ide_rp = true, tee_rp = false;
+	unsigned int sel_index, id, def, en;
+	struct tsm_dev *tdev;
+
+	if (sscanf(buf, "%u %u %u %u %u %u %u %u", &sel_index, &id, &def, &en,
+		   &ide_dev, &tee_dev, &ide_rp, &tee_rp) != 8) {
+		if (sscanf(buf, "%u %u %u %u", &sel_index, &id, &def, &en) != 4)
+			return -EINVAL;
+	}
+
+	if (sel_index >= ARRAY_SIZE(tdev->selective_ide) || id > 0x100)
+		return -EINVAL;
+
+	tdev = tsm_dev_get(dev);
+	if (en) {
+		tdev->selective_ide[sel_index].id = id;
+		tdev->selective_ide[sel_index].def = def;
+		tdev->selective_ide[sel_index].enable = 1;
+		tdev->selective_ide[sel_index].enabled = 0;
+		tdev->selective_ide[sel_index].dev_ide_cfg = ide_dev;
+		tdev->selective_ide[sel_index].dev_tee_limited = tee_dev;
+		tdev->selective_ide[sel_index].rootport_ide_cfg = ide_rp;
+		tdev->selective_ide[sel_index].rootport_tee_limited = tee_rp;
+	} else {
+		memset(&tdev->selective_ide[sel_index], 0, sizeof(tdev->selective_ide[0]));
+	}
+
+	tsm_dev_put(tdev);
+	return count;
+}
+
+static ssize_t tsm_sel_stream_show(struct device *dev, struct device_attribute *attr,
+				   char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	struct pci_dev *rootport = tdev->pdev->bus->self;
+	unsigned int i;
+	char *buf1;
+	ssize_t ret = 0, sz = PAGE_SIZE;
+
+	buf1 = kmalloc(sz, GFP_KERNEL);
+	if (!buf1)
+		return -ENOMEM;
+
+	buf1[0] = 0;
+	for (i = 0; i < ARRAY_SIZE(tdev->selective_ide); ++i) {
+		if (!tdev->selective_ide[i].enable)
+			continue;
+
+		ret += snprintf(buf1 + ret, sz - ret - 1, "%u: %d%s",
+				i,
+				tdev->selective_ide[i].id,
+				tdev->selective_ide[i].def ? " DEF" : "");
+		if (tdev->selective_ide[i].enabled) {
+			u32 devst = 0, rcst = 0;
+
+			pci_ide_get_sel_sta(tdev->pdev, i, &devst);
+			pci_ide_get_sel_sta(rootport, i, &rcst);
+			ret += snprintf(buf1 + ret, sz - ret - 1,
+				" %x%s %s%s<-> %x%s %s%s rootport:%s",
+				devst,
+				PCI_IDE_SEL_STS_STATUS(devst) == 2 ? "=SECURE" : "",
+				tdev->selective_ide[i].dev_ide_cfg ? "IDECfg " : "",
+				tdev->selective_ide[i].dev_tee_limited ? "TeeLim " : "",
+				rcst,
+				PCI_IDE_SEL_STS_STATUS(rcst) == 2 ? "=SECURE" : "",
+				tdev->selective_ide[i].rootport_ide_cfg ? "IDECfg " : "",
+				tdev->selective_ide[i].rootport_tee_limited ? "TeeLim " : "",
+				pci_name(rootport)
+			       );
+		}
+		ret += snprintf(buf1 + ret, sz - ret - 1, "\n");
+	}
+	tsm_dev_put(tdev);
+
+	ret = sysfs_emit(buf, buf1);
+	kfree(buf1);
+
+	return ret;
+}
+
+static DEVICE_ATTR_RW(tsm_sel_stream);
+
+static ssize_t tsm_ide_refresh_store(struct device *dev, struct device_attribute *attr,
+				     const char *buf, size_t count)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	int ret;
+
+	ret = tsm_ide_refresh(tdev, tsm.private_data);
+	tsm_dev_put(tdev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static DEVICE_ATTR_WO(tsm_ide_refresh);
+
+static ssize_t blob_show(struct tsm_blob *blob, char *buf)
+{
+	unsigned int n, m;
+
+	if (!blob)
+		return sysfs_emit(buf, "none\n");
+
+	n = snprintf(buf, PAGE_SIZE, "%lu %u\n", blob->len,
+		     kref_read(&blob->kref));
+	m = hex_dump_to_buffer(blob->data, blob->len, 32, 1,
+			       buf + n, PAGE_SIZE - n, false);
+	n += min(PAGE_SIZE - n, m);
+	n += snprintf(buf + n, PAGE_SIZE - n, "...\n");
+	return n;
+}
+
+static ssize_t tsm_certs_gen(struct tsm_blob *certs, char *buf, size_t len)
+{
+	struct spdm_certchain_block_header *h;
+	unsigned int n = 0, m, i, off, o2;
+	u8 *p;
+
+	for (i = 0, off = 0; off < certs->len; ++i) {
+		h = (struct spdm_certchain_block_header *) ((u8 *)certs->data + off);
+		if (WARN_ON_ONCE(h->length > certs->len - off))
+			return 0;
+
+		n += snprintf(buf + n, len - n, "[%d] len=%d:\n", i, h->length);
+
+		for (o2 = 0, p = (u8 *)&h[1]; o2 < h->length; o2 += 32) {
+			m = hex_dump_to_buffer(p + o2, h->length - o2, 32, 1,
+					       buf + n, len - n, true);
+			n += min(len - n, m);
+			n += snprintf(buf + n, len - n, "\n");
+		}
+
+		off += h->length; /* Includes the header */
+	}
+
+	return n;
+}
+
+static ssize_t tsm_certs_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t n = 0;
+
+	if (!tdev->certs) {
+		n = sysfs_emit(buf, "none\n");
+	} else {
+		n = tsm_certs_gen(tdev->certs, buf, PAGE_SIZE);
+		if (!n)
+			n = blob_show(tdev->certs, buf);
+	}
+
+	tsm_dev_put(tdev);
+	return n;
+}
+
+static DEVICE_ATTR_RO(tsm_certs);
+
+static ssize_t tsm_meas_gen(struct tsm_blob *meas, char *buf, size_t len)
+{
+	static const char * const whats[] = {
+		"ImmuROM", "MutFW", "HWCfg", "FWCfg",
+		"MeasMft", "DevDbg", "MutFWVer", "MutFWVerSec"
+	};
+	struct dmtf_measurement_block_device_mode *dm;
+	struct spdm_measurement_block_header *mb;
+	struct dmtf_measurement_block_header *h;
+	unsigned int n = 0, m, off, what;
+	bool dmtf;
+
+	for (off = 0; off < meas->len; ) {
+		mb = (struct spdm_measurement_block_header *)(((u8 *) meas->data) + off);
+		dmtf = mb->spec & 1;
+
+		n += snprintf(buf + n, len - n, "#%d (%d) ", mb->index, mb->size);
+		if (dmtf) {
+			h = (void *) &mb[1];
+
+			if (WARN_ON_ONCE(mb->size != (sizeof(*h) + h->size)))
+				return -EINVAL;
+
+			what = h->type & 0x7F;
+			n += snprintf(buf + n, len - n, "%x=[%s %s]: ",
+				h->type,
+				h->type & 0x80 ? "digest" : "raw",
+				what < ARRAY_SIZE(whats) ? whats[what] : "reserved");
+
+			if (what == 5) {
+				dm = (struct dmtf_measurement_block_device_mode *) &h[1];
+				n += snprintf(buf + n, len - n, " %x %x %x %x",
+					      dm->opmode_cap, dm->opmode_sta,
+					      dm->devmode_cap, dm->devmode_sta);
+			} else {
+				m = hex_dump_to_buffer(&h[1], h->size, 32, 1,
+						       buf + n, len - n, false);
+				n += min(PAGE_SIZE - n, m);
+			}
+		} else {
+			n += snprintf(buf + n, len - n, "spec=%x: ", mb->spec);
+			m = hex_dump_to_buffer(&mb[1], min(len - off, mb->size),
+					       32, 1, buf + n, len - n, false);
+			n += min(PAGE_SIZE - n, m);
+		}
+
+		off += sizeof(*mb) + mb->size;
+		n += snprintf(buf + n, PAGE_SIZE - n, "...\n");
+	}
+
+	return n;
+}
+
+static ssize_t tsm_meas_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	ssize_t n = 0;
+
+	if (!tdev->meas) {
+		n = sysfs_emit(buf, "none\n");
+	} else {
+		if (!n)
+			n = tsm_meas_gen(tdev->meas, buf, PAGE_SIZE);
+		if (!n)
+			n = blob_show(tdev->meas, buf);
+	}
+
+	tsm_dev_put(tdev);
+	return n;
+}
+
+static DEVICE_ATTR_RO(tsm_meas);
+
+static ssize_t tsm_dev_status_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_dev *tdev = tsm_dev_get(dev);
+	struct tsm_dev_status s = { 0 };
+	int ret = tsm_dev_status(tdev, tsm.private_data, &s);
+	ssize_t ret1;
+
+	ret1 = sysfs_emit(buf, "ret=%d\n"
+			  "ctx_state=%x\n"
+			  "tc_mask=%x\n"
+			  "certs_slot=%x\n"
+			  "device_id=%x\n"
+			  "segment_id=%x\n"
+			  "no_fw_update=%x\n",
+			  ret,
+			  s.ctx_state,
+			  s.tc_mask,
+			  s.certs_slot,
+			  s.device_id,
+			  s.segment_id,
+			  s.no_fw_update);
+
+	tsm_dev_put(tdev);
+	return ret1;
+}
+
+static DEVICE_ATTR_RO(tsm_dev_status);
+
+static struct attribute *host_dev_attrs[] = {
+	&dev_attr_tsm_cert_slot.attr,
+	&dev_attr_tsm_tc_mask.attr,
+	&dev_attr_tsm_dev_connect.attr,
+	&dev_attr_tsm_sel_stream.attr,
+	&dev_attr_tsm_ide_refresh.attr,
+	&dev_attr_tsm_certs.attr,
+	&dev_attr_tsm_meas.attr,
+	&dev_attr_tsm_dev_status.attr,
+	NULL,
+};
+static const struct attribute_group host_dev_group = {
+	.attrs = host_dev_attrs,
+};
+
+static struct attribute *guest_dev_attrs[] = {
+	&dev_attr_tsm_certs.attr,
+	&dev_attr_tsm_meas.attr,
+	NULL,
+};
+static const struct attribute_group guest_dev_group = {
+	.attrs = guest_dev_attrs,
+};
+
+static ssize_t tsm_tdi_bind_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+
+	if (!tdi->vmid)
+		return sysfs_emit(buf, "not bound\n");
+
+	return sysfs_emit(buf, "VM=%#llx ASID=%d BDFn=%x:%x.%d\n",
+			  tdi->vmid, tdi->asid,
+			  PCI_BUS_NUM(tdi->guest_rid), PCI_SLOT(tdi->guest_rid),
+			  PCI_FUNC(tdi->guest_rid));
+}
+
+static DEVICE_ATTR_RO(tsm_tdi_bind);
+
+static ssize_t tsm_tdi_validate_store(struct device *dev, struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+	unsigned long val;
+	ssize_t ret;
+
+	if (kstrtoul(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if (val) {
+		ret = tsm_tdi_validate(tdi, false, tsm.private_data);
+		if (ret)
+			return ret;
+	} else {
+		tsm_tdi_validate(tdi, true, tsm.private_data);
+	}
+
+	tdi->validated = val;
+
+	return count;
+}
+
+static ssize_t tsm_tdi_validate_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+
+	return sysfs_emit(buf, "%u\n", tdi->validated);
+}
+
+static DEVICE_ATTR_RW(tsm_tdi_validate);
+
+ssize_t tsm_report_gen(struct tsm_blob *report, char *buf, size_t len)
+{
+	struct tdi_report_header *h = TDI_REPORT_HDR(report);
+	struct tdi_report_mmio_range *mr = TDI_REPORT_MR_OFF(report);
+	struct tdi_report_footer *f = TDI_REPORT_FTR(report);
+	unsigned int n, m, i;
+
+	n = snprintf(buf, len,
+		     "no_fw_update=%u\ndma_no_pasid=%u\ndma_pasid=%u\nats=%u\nprs=%u\n",
+		     h->no_fw_update, h->dma_no_pasid, h->dma_pasid, h->ats, h->prs);
+	n += snprintf(buf + n, len - n,
+		      "msi_x_message_control=%#04x\nlnr_control=%#04x\n",
+		      h->msi_x_message_control, h->lnr_control);
+	n += snprintf(buf + n, len - n, "tph_control=%#08x\n", h->tph_control);
+
+	for (i = 0; i < h->mmio_range_count; ++i) {
+		n += snprintf(buf + n, len - n,
+			      "[%i] #%u %#016llx +%#lx MSIX%c PBA%c NonTEE%c Upd%c\n",
+			      i, mr[i].range_id, mr[i].first_page << PAGE_SHIFT,
+			      (unsigned long) mr[i].num << PAGE_SHIFT,
+			      mr[i].msix_table ? '+':'-',
+			      mr[i].msix_pba ? '+':'-',
+			      mr[i].is_non_tee_mem ? '+':'-',
+			      mr[i].is_mem_attr_updatable ? '+':'-');
+		if (mr[i].reserved)
+			n += snprintf(buf + n, len - n,
+			      "[%i] WARN: reserved=%#x\n", i, mr[i].range_attributes);
+	}
+
+	if (f->device_specific_info_len) {
+		unsigned int num = report->len - ((u8 *)f->device_specific_info - (u8 *)h);
+
+		num = min(num, f->device_specific_info_len);
+		n += snprintf(buf + n, len - n, "DevSp len=%d%s",
+			f->device_specific_info_len, num ? ": " : "");
+		m = hex_dump_to_buffer(f->device_specific_info, num, 32, 1,
+				       buf + n, len - n, false);
+		n += min(len - n, m);
+		n += snprintf(buf + n, len - n, m ? "\n" : "...\n");
+	}
+
+	return n;
+}
+EXPORT_SYMBOL_GPL(tsm_report_gen);
+
+static ssize_t tsm_report_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+	ssize_t n = 0;
+
+	if (!tdi->report) {
+		n = sysfs_emit(buf, "none\n");
+	} else {
+		if (!n)
+			n = tsm_report_gen(tdi->report, buf, PAGE_SIZE);
+		if (!n)
+			n = blob_show(tdi->report, buf);
+	}
+
+	return n;
+}
+
+static DEVICE_ATTR_RO(tsm_report);
+
+static char *spdm_algos_to_str(u64 algos, char *buf, size_t len)
+{
+	size_t n = 0;
+
+	buf[0] = 0;
+#define __ALGO(x) do {								\
+		if ((n < len) && (algos & (1ULL << (TSM_TDI_SPDM_ALGOS_##x))))	\
+			n += snprintf(buf + n, len - n, #x" ");			\
+	} while (0)
+
+	__ALGO(DHE_SECP256R1);
+	__ALGO(DHE_SECP384R1);
+	__ALGO(AEAD_AES_128_GCM);
+	__ALGO(AEAD_AES_256_GCM);
+	__ALGO(ASYM_TPM_ALG_RSASSA_3072);
+	__ALGO(ASYM_TPM_ALG_ECDSA_ECC_NIST_P256);
+	__ALGO(ASYM_TPM_ALG_ECDSA_ECC_NIST_P384);
+	__ALGO(HASH_TPM_ALG_SHA_256);
+	__ALGO(HASH_TPM_ALG_SHA_384);
+	__ALGO(KEY_SCHED_SPDM_KEY_SCHEDULE);
+#undef __ALGO
+	return buf;
+}
+
+static const char *tdisp_state_to_str(enum tsm_tdisp_state state)
+{
+	switch (state) {
+#define __ST(x) case TDISP_STATE_##x: return #x
+	case TDISP_STATE_UNAVAIL: return "TDISP state unavailable";
+	__ST(CONFIG_UNLOCKED);
+	__ST(CONFIG_LOCKED);
+	__ST(RUN);
+	__ST(ERROR);
+#undef __ST
+	default: return "unknown";
+	}
+}
+
+static ssize_t tsm_tdi_status_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+	struct tsm_tdi_status ts = { 0 };
+	char algos[256] = "";
+	unsigned int n, m;
+	int ret;
+
+	ret = tsm_tdi_status(tdi, tsm.private_data, &ts);
+	if (ret < 0)
+		return sysfs_emit(buf, "ret=%d\n\n", ret);
+
+	if (!ts.valid)
+		return sysfs_emit(buf, "ret=%d\nstate=%d:%s\n",
+				  ret, ts.state, tdisp_state_to_str(ts.state));
+
+	n = snprintf(buf, PAGE_SIZE,
+		     "ret=%d\n"
+		     "state=%d:%s\n"
+		     "meas_digest_fresh=%x\n"
+		     "meas_digest_valid=%x\n"
+		     "all_request_redirect=%x\n"
+		     "bind_p2p=%x\n"
+		     "lock_msix=%x\n"
+		     "no_fw_update=%x\n"
+		     "cache_line_size=%d\n"
+		     "algos=%#llx:%s\n"
+		     ,
+		     ret,
+		     ts.state, tdisp_state_to_str(ts.state),
+		     ts.meas_digest_fresh,
+		     ts.meas_digest_valid,
+		     ts.all_request_redirect,
+		     ts.bind_p2p,
+		     ts.lock_msix,
+		     ts.no_fw_update,
+		     ts.cache_line_size,
+		     ts.spdm_algos, spdm_algos_to_str(ts.spdm_algos, algos, sizeof(algos) - 1));
+
+	n += snprintf(buf + n, PAGE_SIZE - n, "Certs digest: ");
+	m = hex_dump_to_buffer(ts.certs_digest, sizeof(ts.certs_digest), 32, 1,
+			       buf + n, PAGE_SIZE - n, false);
+	n += min(PAGE_SIZE - n, m);
+	n += snprintf(buf + n, PAGE_SIZE - n, "...\nMeasurements digest: ");
+	m = hex_dump_to_buffer(ts.meas_digest, sizeof(ts.meas_digest), 32, 1,
+			       buf + n, PAGE_SIZE - n, false);
+	n += min(PAGE_SIZE - n, m);
+	n += snprintf(buf + n, PAGE_SIZE - n, "...\nInterface report digest: ");
+	m = hex_dump_to_buffer(ts.interface_report_digest, sizeof(ts.interface_report_digest),
+			       32, 1, buf + n, PAGE_SIZE - n, false);
+	n += min(PAGE_SIZE - n, m);
+	n += snprintf(buf + n, PAGE_SIZE - n, "...\n");
+
+	return n;
+}
+
+static DEVICE_ATTR_RO(tsm_tdi_status);
+
+static struct attribute *host_tdi_attrs[] = {
+	&dev_attr_tsm_tdi_bind.attr,
+	&dev_attr_tsm_report.attr,
+	&dev_attr_tsm_tdi_status.attr,
+	NULL,
+};
+
+static const struct attribute_group host_tdi_group = {
+	.attrs = host_tdi_attrs,
+};
+
+static struct attribute *guest_tdi_attrs[] = {
+	&dev_attr_tsm_tdi_validate.attr,
+	&dev_attr_tsm_report.attr,
+	&dev_attr_tsm_tdi_status.attr,
+	NULL,
+};
+
+static const struct attribute_group guest_tdi_group = {
+	.attrs = guest_tdi_attrs,
+};
+
+static int tsm_tdi_init(struct tsm_dev *tdev, struct pci_dev *pdev)
+{
+	struct tsm_tdi *tdi;
+	int ret = 0;
+
+	dev_info(&pdev->dev, "Initializing tdi\n");
+	if (!tdev)
+		return -ENODEV;
+
+	tdi = kzalloc(sizeof(*tdi), GFP_KERNEL);
+	if (!tdi)
+		return -ENOMEM;
+
+	/* tsm_dev_get() requires pdev->dev.tdi which is set later */
+	if (!kref_get_unless_zero(&tdev->kref)) {
+		ret = -EPERM;
+		goto free_exit;
+	}
+
+	if (tsm.ops->dev_connect)
+		tdi->ag = &host_tdi_group;
+	else
+		tdi->ag = &guest_tdi_group;
+
+	ret = sysfs_create_link(&pdev->dev.kobj, &tdev->pdev->dev.kobj, "tsm_dev");
+	if (ret)
+		goto free_exit;
+
+	ret = device_add_group(&pdev->dev, tdi->ag);
+	if (ret)
+		goto sysfs_unlink_exit;
+
+	tdi->tdev = tdev;
+	tdi->pdev = pci_dev_get(pdev);
+
+	pdev->dev.tdi_enabled = !pdev->is_physfn || tsm.physfn;
+	pdev->dev.tdi = tdi;
+	pci_info(pdev, "TDI enabled=%d\n", pdev->dev.tdi_enabled);
+
+	return 0;
+
+sysfs_unlink_exit:
+	sysfs_remove_link(&pdev->dev.kobj, "tsm_dev");
+free_exit:
+	kfree(tdi);
+
+	return ret;
+}
+
+static void tsm_tdi_free(struct tsm_tdi *tdi)
+{
+	tsm_dev_put(tdi->tdev);
+
+	pci_dev_put(tdi->pdev);
+
+	device_remove_group(&tdi->pdev->dev, tdi->ag);
+	sysfs_remove_link(&tdi->pdev->dev.kobj, "tsm_dev");
+	tdi->pdev->dev.tdi = NULL;
+	tdi->pdev->dev.tdi_enabled = false;
+	kfree(tdi);
+}
+
+static int tsm_dev_init(struct pci_dev *pdev, struct tsm_dev **ptdev)
+{
+	struct tsm_dev *tdev;
+	int ret = 0;
+
+	dev_info(&pdev->dev, "Initializing tdev\n");
+	tdev = kzalloc(sizeof(*tdev), GFP_KERNEL);
+	if (!tdev)
+		return -ENOMEM;
+
+	kref_init(&tdev->kref);
+	tdev->tc_mask = tsm.tc_mask;
+	tdev->cert_slot = tsm.cert_slot;
+	tdev->pdev = pci_dev_get(pdev);
+	mutex_init(&tdev->spdm_mutex);
+
+	if (tsm.ops->dev_connect)
+		tdev->ag = &host_dev_group;
+	else
+		tdev->ag = &guest_dev_group;
+
+	ret = device_add_group(&pdev->dev, tdev->ag);
+	if (ret)
+		goto free_exit;
+
+	if (tsm.ops->dev_connect) {
+		ret = -EPERM;
+		tdev->pdev = pci_dev_get(pdev);
+		tdev->spdm.doe_mb = pci_find_doe_mailbox(tdev->pdev,
+							 PCI_VENDOR_ID_PCI_SIG,
+							 PCI_DOE_PROTOCOL_CMA_SPDM);
+		if (!tdev->spdm.doe_mb)
+			goto pci_dev_put_exit;
+
+		tdev->spdm.doe_mb_secured = pci_find_doe_mailbox(tdev->pdev,
+								 PCI_VENDOR_ID_PCI_SIG,
+								 PCI_DOE_PROTOCOL_SECURED_CMA_SPDM);
+		if (!tdev->spdm.doe_mb_secured)
+			goto pci_dev_put_exit;
+	}
+
+	*ptdev = tdev;
+	return 0;
+
+pci_dev_put_exit:
+	pci_dev_put(pdev);
+free_exit:
+	kfree(tdev);
+
+	return ret;
+}
+
+static void tsm_dev_free(struct kref *kref)
+{
+	struct tsm_dev *tdev = container_of(kref, struct tsm_dev, kref);
+
+	device_remove_group(&tdev->pdev->dev, tdev->ag);
+
+	if (tdev->connected)
+		tsm_dev_reclaim(tdev, tsm.private_data);
+
+	dev_info(&tdev->pdev->dev, "Freeing TDEV\n");
+	pci_dev_put(tdev->pdev);
+	kfree(tdev);
+}
+
+static int tsm_alloc_device(struct pci_dev *pdev)
+{
+	int ret = 0;
+
+	/* It is guest VM == TVM */
+	if (!tsm.ops->dev_connect) {
+		if (pdev->devcap & PCI_EXP_DEVCAP_TEE_IO) {
+			struct tsm_dev *tdev = NULL;
+
+			ret = tsm_dev_init(pdev, &tdev);
+			if (ret)
+				return ret;
+
+			ret = tsm_tdi_init(tdev, pdev);
+			tsm_dev_put(tdev);
+			return ret;
+		}
+		return 0;
+	}
+
+	if (pdev->is_physfn && (PCI_FUNC(pdev->devfn) == 0) &&
+	    (pdev->devcap & PCI_EXP_DEVCAP_TEE_IO)) {
+		struct tsm_dev *tdev = NULL;
+
+
+		ret = tsm_dev_init(pdev, &tdev);
+		if (ret)
+			return ret;
+
+		ret = tsm_tdi_init(tdev, pdev);
+		tsm_dev_put(tdev);
+		return ret;
+	}
+
+	if (pdev->is_virtfn) {
+		struct pci_dev *pf0 = pci_get_slot(pdev->physfn->bus,
+						   pdev->physfn->devfn & ~7);
+
+		if (pf0 && (pf0->devcap & PCI_EXP_DEVCAP_TEE_IO)) {
+			struct tsm_dev *tdev = tsm_dev_get(&pf0->dev);
+
+			ret = tsm_tdi_init(tdev, pdev);
+			tsm_dev_put(tdev);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void tsm_dev_freeice(struct device *dev)
+{
+	struct tsm_tdi *tdi = tsm_tdi_get(dev);
+
+	if (!tdi)
+		return;
+
+	tsm_tdi_free(tdi);
+}
+
+static int tsm_pci_bus_notifier(struct notifier_block *nb, unsigned long action, void *data)
+{
+	switch (action) {
+	case BUS_NOTIFY_ADD_DEVICE:
+		tsm_alloc_device(to_pci_dev(data));
+		break;
+	case BUS_NOTIFY_DEL_DEVICE:
+		tsm_dev_freeice(data);
+		break;
+	case BUS_NOTIFY_UNBOUND_DRIVER:
+		tsm_tdi_validate(tsm_tdi_get(data), true, tsm.private_data);
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block tsm_pci_bus_nb = {
+	.notifier_call = tsm_pci_bus_notifier,
+};
+
+static int __init tsm_init(void)
+{
+	int ret = 0;
+
+	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+	return ret;
+}
+
+static void __exit tsm_cleanup(void)
+{
+}
+
+void tsm_set_ops(struct tsm_ops *ops, void *private_data)
+{
+	struct pci_dev *pdev = NULL;
+	int ret;
+
+	if (!tsm.ops && ops) {
+		tsm.ops = ops;
+		tsm.private_data = private_data;
+
+		for_each_pci_dev(pdev) {
+			ret = tsm_alloc_device(pdev);
+			if (ret)
+				break;
+		}
+		bus_register_notifier(&pci_bus_type, &tsm_pci_bus_nb);
+	} else {
+		bus_unregister_notifier(&pci_bus_type, &tsm_pci_bus_nb);
+		for_each_pci_dev(pdev)
+			tsm_dev_freeice(&pdev->dev);
+		tsm.ops = ops;
+	}
+}
+EXPORT_SYMBOL_GPL(tsm_set_ops);
+
+int tsm_tdi_bind(struct tsm_tdi *tdi, u32 guest_rid, u64 vmid, u32 asid)
+{
+	int ret;
+
+	if (WARN_ON(!tsm.ops->tdi_bind))
+		return -EPERM;
+
+	tdi->guest_rid = guest_rid;
+	tdi->vmid = vmid;
+	tdi->asid = asid;
+
+	mutex_lock(&tdi->tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->tdi_bind(tdi, guest_rid, vmid, asid, tsm.private_data);
+		if (ret < 0)
+			break;
+
+		if (!ret)
+			break;
+
+		ret = spdm_forward(&tdi->tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdi->tdev->spdm_mutex);
+
+	if (ret) {
+		tsm_tdi_unbind(tdi);
+		return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tsm_tdi_bind);
+
+void tsm_tdi_unbind(struct tsm_tdi *tdi)
+{
+	tsm_tdi_reclaim(tdi, tsm.private_data);
+	tdi->vmid = 0;
+	tdi->asid = 0;
+	tdi->guest_rid = 0;
+}
+EXPORT_SYMBOL_GPL(tsm_tdi_unbind);
+
+int tsm_guest_request(struct tsm_tdi *tdi, enum tsm_tdisp_state *state, void *req_data)
+{
+	int ret;
+
+	if (!tsm.ops->guest_request)
+		return -EPERM;
+
+	mutex_lock(&tdi->tdev->spdm_mutex);
+	while (1) {
+		ret = tsm.ops->guest_request(tdi, tdi->guest_rid, tdi->vmid, req_data,
+					     state, tsm.private_data);
+		if (ret <= 0)
+			break;
+
+		ret = spdm_forward(&tdi->tdev->spdm, ret);
+		if (ret < 0)
+			break;
+	}
+	mutex_unlock(&tdi->tdev->spdm_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tsm_guest_request);
+
+struct tsm_tdi *tsm_tdi_find(u32 guest_rid, u64 vmid)
+{
+	struct pci_dev *pdev = NULL;
+	struct tsm_tdi *tdi;
+
+	for_each_pci_dev(pdev) {
+		tdi = tsm_tdi_get(&pdev->dev);
+		if (!tdi)
+			continue;
+
+		if (tdi->vmid == vmid && tdi->guest_rid == guest_rid)
+			return tdi;
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(tsm_tdi_find);
+
+module_init(tsm_init);
+module_exit(tsm_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/Documentation/virt/coco/tsm.rst b/Documentation/virt/coco/tsm.rst
new file mode 100644
index 000000000000..3be6e8491e42
--- /dev/null
+++ b/Documentation/virt/coco/tsm.rst
@@ -0,0 +1,62 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+What it is
+==========
+
+This is for PCI passthrough in confidential computing (CoCo: SEV-SNP, TDX, CoVE).
+Currently passing through PCI devices to a CoCo VM uses SWIOTLB to pre-shared
+memory buffers.
+
+PCIe IDE (Integrity and Data Encryption) and TDISP (TEE Device Interface Security
+Protocol) are protocols to enable encryption over PCIe link and DMA to encrypted
+memory. This doc is focused to DMAing to encrypted VM, the encrypted host memory is
+out of scope.
+
+
+Protocols
+=========
+
+PCIe r6 DOE is a mailbox protocol to read/write object from/to device.
+Objects are of plain SPDM or secure SPDM type. SPDM is responsible for authenticating
+devices, creating a secure link between a device and TSM.
+IDE_KM manages PCIe link encryption keys, it works on top of secure SPDM.
+TDISP manages a passed through PCI function state, also works on top on secure SPDM.
+Additionally, PCIe defines IDE capability which provides the host OS a way
+to enable streams on the PCIe link.
+
+
+TSM module
+==========
+
+This is common place to trigger device authentication and keys management.
+It exposes certificates/measurenets/reports/status via sysfs and provides control
+over the link (limited though by the TSM capabilities).
+A platform is expected to register a specific set of hooks. The same module works
+in host and guest OS, the set of requires platform hooks is quite different.
+
+
+Flow
+====
+
+At the boot time the tsm.ko scans the PCI bus to find and setup TDISP-cabable
+devices; it also listens to hotplug events. If setup was successful, tsm-prefixed
+nodes will appear in sysfs.
+
+Then, the user enables IDE by writing to /sys/bus/pci/devices/0000:e1:00.0/tsm_dev_connect
+and this is how PCIe encryption is enabled.
+
+To pass the device through, a modifined VMM is required.
+
+In the VM, the same tsm.ko loads. In addition to the host's setup, the VM wants
+to receive the report and enable secure DMA or/and secure MMIO, via some VM<->HV
+protocol (such as AMD GHCB). Once this is done, a VM can access validated MMIO
+with the Cbit set and the device can DMA to encrypted memory.
+
+
+References
+==========
+
+[1] TEE Device Interface Security Protocol - TDISP - v2022-07-27
+https://members.pcisig.com/wg/PCI-SIG/document/18268?downloadRevision=21500
+[2] Security Protocol and Data Model (SPDM)
+https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.2.1.pdf
diff --git a/drivers/virt/coco/Kconfig b/drivers/virt/coco/Kconfig
index 87d142c1f932..67a9c9daf96d 100644
--- a/drivers/virt/coco/Kconfig
+++ b/drivers/virt/coco/Kconfig
@@ -7,6 +7,17 @@  config TSM_REPORTS
 	select CONFIGFS_FS
 	tristate
 
+config TSM
+	tristate "Platform support for TEE Device Interface Security Protocol (TDISP)"
+	default m
+	depends on AMD_MEM_ENCRYPT
+	select PCI_DOE
+	select PCI_IDE
+	help
+	  Add a common place for user visible platform support for PCIe TDISP.
+	  TEE Device Interface Security Protocol (TDISP) from PCI-SIG,
+	  https://pcisig.com/tee-device-interface-security-protocol-tdisp
+
 source "drivers/virt/coco/efi_secret/Kconfig"
 
 source "drivers/virt/coco/sev-guest/Kconfig"