Message ID | 20230724111335.107427-5-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add Intel VT-d nested translation | expand |
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Monday, July 24, 2023 7:13 PM > } > + > +/** > + * intel_pasid_setup_nested() - Set up PASID entry for nested translation. > + * This could be used for nested translation based vIOMMU. e.g. guest IOVA s/could be/is/ > + * and guest shared virtual address. In this case, the first level page > + * tables are used for GVA/GIOVA-GPA translation in the guest, second level > + * page tables are used for GPA-HPA translation. let's be consistent on using stage-1/stage-2 btw the convention is to have 1-line summary, then the list of parameters followed by detail explanation of the function. > + * > + * @iommu: IOMMU which the device belong to > + * @dev: Device to be set up for translation > + * @pasid: PASID to be programmed in the device PASID table > + * @domain: User stage-1 domain nested on a s2 domain > + */ > +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device > *dev, > + u32 pasid, struct dmar_domain *domain) > +{ > + struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; > + pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl; > + struct dmar_domain *s2_domain = domain->s2_domain; > + u16 did = domain_id_iommu(domain, iommu); > + struct dma_pte *pgd = s2_domain->pgd; > + struct pasid_entry *pte; > + > + if (!ecap_nest(iommu->ecap)) { > + pr_err_ratelimited("%s: No nested translation support\n", > + iommu->name); > + return -ENODEV; > + } -EINVAL > + > + if (s2_domain->agaw > iommu->agaw) { > + pr_err_ratelimited("Incompatible agaw %s\n", iommu- > >name); > + return -EINVAL; > + } there is a duplicated check in intel_nested_attach_dev(). > + > + pasid_set_slptr(pte, virt_to_phys(pgd)); > + pasid_set_fault_enable(pte); > + pasid_set_domain_id(pte, did); > + pasid_set_address_width(pte, s2_domain->agaw); > + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); > + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); > + pasid_set_present(pte); > + spin_unlock(&iommu->lock); > + this lacks of handling of force_snooping
On 2023/8/2 15:10, Tian, Kevin wrote: >> + >> + pasid_set_slptr(pte, virt_to_phys(pgd)); >> + pasid_set_fault_enable(pte); >> + pasid_set_domain_id(pte, did); >> + pasid_set_address_width(pte, s2_domain->agaw); >> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); >> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); >> + pasid_set_present(pte); >> + spin_unlock(&iommu->lock); >> + > this lacks of handling of force_snooping If stage-2 domain has force_snooping attribute set, then we should set the bit field in the pasid entry, right? How about below additional change? diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 19ac4084913b..86db81ec91f1 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) pasid_set_eafe(pte); + if (s2_domain>force_snooping) + pasid_set_pgsnp(pte); + pasid_set_slptr(pte, virt_to_phys(pgd)); pasid_set_fault_enable(pte); pasid_set_domain_id(pte, did); Best regards, baolu
On 2023/8/2 16:09, Baolu Lu wrote: > On 2023/8/2 15:10, Tian, Kevin wrote: >>> + >>> + pasid_set_slptr(pte, virt_to_phys(pgd)); >>> + pasid_set_fault_enable(pte); >>> + pasid_set_domain_id(pte, did); >>> + pasid_set_address_width(pte, s2_domain->agaw); >>> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); >>> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); >>> + pasid_set_present(pte); >>> + spin_unlock(&iommu->lock); >>> + >> this lacks of handling of force_snooping > > If stage-2 domain has force_snooping attribute set, then we should set > the bit field in the pasid entry, right? > > How about below additional change? > > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c > index 19ac4084913b..86db81ec91f1 100644 > --- a/drivers/iommu/intel/pasid.c > +++ b/drivers/iommu/intel/pasid.c > @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu > *iommu, struct device *dev, > if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) > pasid_set_eafe(pte); > > + if (s2_domain>force_snooping) + if (s2_domain->force_snooping) Sorry for typo. > + pasid_set_pgsnp(pte); > + > pasid_set_slptr(pte, virt_to_phys(pgd)); > pasid_set_fault_enable(pte); > pasid_set_domain_id(pte, did); > > Best regards, > baolu
> From: Baolu Lu <baolu.lu@linux.intel.com> > Sent: Wednesday, August 2, 2023 4:12 PM > > On 2023/8/2 16:09, Baolu Lu wrote: > > On 2023/8/2 15:10, Tian, Kevin wrote: > >>> + > >>> + pasid_set_slptr(pte, virt_to_phys(pgd)); > >>> + pasid_set_fault_enable(pte); > >>> + pasid_set_domain_id(pte, did); > >>> + pasid_set_address_width(pte, s2_domain->agaw); > >>> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); > >>> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); > >>> + pasid_set_present(pte); > >>> + spin_unlock(&iommu->lock); > >>> + > >> this lacks of handling of force_snooping > > > > If stage-2 domain has force_snooping attribute set, then we should set > > the bit field in the pasid entry, right? > > > > How about below additional change? yes > > > > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c > > index 19ac4084913b..86db81ec91f1 100644 > > --- a/drivers/iommu/intel/pasid.c > > +++ b/drivers/iommu/intel/pasid.c > > @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu > > *iommu, struct device *dev, > > if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) > > pasid_set_eafe(pte); > > > > + if (s2_domain>force_snooping) > > + if (s2_domain->force_snooping) > > Sorry for typo. > > > + pasid_set_pgsnp(pte); > > + > > pasid_set_slptr(pte, virt_to_phys(pgd)); > > pasid_set_fault_enable(pte); > > pasid_set_domain_id(pte, did); > > > > Best regards, > > baolu
On 2023/8/2 15:10, Tian, Kevin wrote: >> From: Liu, Yi L<yi.l.liu@intel.com> >> Sent: Monday, July 24, 2023 7:13 PM >> } >> + >> +/** >> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation. >> + * This could be used for nested translation based vIOMMU. e.g. guest IOVA > s/could be/is/ Ack. > >> + * and guest shared virtual address. In this case, the first level page >> + * tables are used for GVA/GIOVA-GPA translation in the guest, second level >> + * page tables are used for GPA-HPA translation. > let's be consistent on using stage-1/stage-2 > > btw the convention is to have 1-line summary, then the list of > parameters followed by detail explanation of the function. > This patch just follows the existing code style in this file. Need a separated patch to cleanup this. >> + * >> + * @iommu: IOMMU which the device belong to >> + * @dev: Device to be set up for translation >> + * @pasid: PASID to be programmed in the device PASID table >> + * @domain: User stage-1 domain nested on a s2 domain >> + */ >> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device >> *dev, >> + u32 pasid, struct dmar_domain *domain) >> +{ >> + struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; >> + pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl; >> + struct dmar_domain *s2_domain = domain->s2_domain; >> + u16 did = domain_id_iommu(domain, iommu); >> + struct dma_pte *pgd = s2_domain->pgd; >> + struct pasid_entry *pte; >> + >> + if (!ecap_nest(iommu->ecap)) { >> + pr_err_ratelimited("%s: No nested translation support\n", >> + iommu->name); >> + return -ENODEV; >> + } > -EINVAL This is in the attach domain path. -EINVAL has the special meaning of "this domain is not compatible with iommu for the device". So here, I still think we should return -ENODEV and the caller doesn't need to retry anymore. > >> + >> + if (s2_domain->agaw > iommu->agaw) { >> + pr_err_ratelimited("Incompatible agaw %s\n", iommu- >>> name); >> + return -EINVAL; >> + } > there is a duplicated check in intel_nested_attach_dev(). > Yeah, should be removed. Best regards, baolu
> From: Baolu Lu <baolu.lu@linux.intel.com> > Sent: Thursday, August 3, 2023 11:13 AM > > On 2023/8/2 15:10, Tian, Kevin wrote: > >> From: Liu, Yi L<yi.l.liu@intel.com> > >> Sent: Monday, July 24, 2023 7:13 PM > >> + > >> + if (!ecap_nest(iommu->ecap)) { > >> + pr_err_ratelimited("%s: No nested translation support\n", > >> + iommu->name); > >> + return -ENODEV; > >> + } > > -EINVAL > > This is in the attach domain path. -EINVAL has the special meaning of > "this domain is not compatible with iommu for the device". > > So here, I still think we should return -ENODEV and the caller doesn't > need to retry anymore. > You are right. I overlooked that this validation is for a device cap.
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index c5d479770e12..af9cfd2d5c52 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -335,6 +335,15 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe) pasid_set_bits(&pe->val[0], 1 << 1, 0); } +/* + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a + * scalable mode PASID entry. + */ +static inline void pasid_set_sre(struct pasid_entry *pe) +{ + pasid_set_bits(&pe->val[2], 1 << 0, 1); +} + /* * Setup the WPE(Write Protect Enable) field (Bit 132) of a * scalable mode PASID entry. @@ -402,6 +411,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value) pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2); } +/* + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135) + * of a scalable mode PASID entry. + */ +static inline void pasid_set_eafe(struct pasid_entry *pe) +{ + pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7); +} + static void pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid) @@ -713,3 +731,112 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, if (!cap_caching_mode(iommu->cap)) devtlb_invalidation_with_pasid(iommu, dev, pasid); } + +/** + * intel_pasid_setup_nested() - Set up PASID entry for nested translation. + * This could be used for nested translation based vIOMMU. e.g. guest IOVA + * and guest shared virtual address. In this case, the first level page + * tables are used for GVA/GIOVA-GPA translation in the guest, second level + * page tables are used for GPA-HPA translation. + * + * @iommu: IOMMU which the device belong to + * @dev: Device to be set up for translation + * @pasid: PASID to be programmed in the device PASID table + * @domain: User stage-1 domain nested on a s2 domain + */ +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, + u32 pasid, struct dmar_domain *domain) +{ + struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; + pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl; + struct dmar_domain *s2_domain = domain->s2_domain; + u16 did = domain_id_iommu(domain, iommu); + struct dma_pte *pgd = s2_domain->pgd; + struct pasid_entry *pte; + + if (!ecap_nest(iommu->ecap)) { + pr_err_ratelimited("%s: No nested translation support\n", + iommu->name); + return -ENODEV; + } + + /* + * Address width should match in two dimensions: CPU vs. IOMMU, + * guest vs. host. + */ + switch (s1_cfg->addr_width) { + case ADDR_WIDTH_4LEVEL: + break; +#ifdef CONFIG_X86 + case ADDR_WIDTH_5LEVEL: + if (!cpu_feature_enabled(X86_FEATURE_LA57) || + !cap_fl5lp_support(iommu->cap)) { + dev_err_ratelimited(dev, + "5-level paging not supported\n"); + return -EINVAL; + } + break; +#endif + default: + dev_err_ratelimited(dev, "Invalid guest address width %d\n", + s1_cfg->addr_width); + return -EINVAL; + } + + if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) { + pr_err_ratelimited("No supervisor request support on %s\n", + iommu->name); + return -EINVAL; + } + + if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) { + pr_err_ratelimited("No extended access flag support on %s\n", + iommu->name); + return -EINVAL; + } + + if (s2_domain->agaw > iommu->agaw) { + pr_err_ratelimited("Incompatible agaw %s\n", iommu->name); + return -EINVAL; + } + + spin_lock(&iommu->lock); + pte = intel_pasid_get_entry(dev, pasid); + if (!pte) { + spin_unlock(&iommu->lock); + return -ENODEV; + } + if (pasid_pte_is_present(pte)) { + spin_unlock(&iommu->lock); + return -EBUSY; + } + + pasid_clear_entry(pte); + + if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL) + pasid_set_flpm(pte, 1); + + pasid_set_flptr(pte, (uintptr_t)s1_gpgd); + + if (s1_cfg->flags & IOMMU_VTD_S1_SRE) { + pasid_set_sre(pte); + if (s1_cfg->flags & IOMMU_VTD_S1_WPE) + pasid_set_wpe(pte); + } + + if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) + pasid_set_eafe(pte); + + pasid_set_slptr(pte, virt_to_phys(pgd)); + pasid_set_fault_enable(pte); + pasid_set_domain_id(pte, did); + pasid_set_address_width(pte, s2_domain->agaw); + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); + pasid_set_present(pte); + spin_unlock(&iommu->lock); + + pasid_flush_caches(iommu, pte, pasid, did); + + return 0; +} diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h index d6b7d21244b1..864b12848392 100644 --- a/drivers/iommu/intel/pasid.h +++ b/drivers/iommu/intel/pasid.h @@ -111,6 +111,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu, int intel_pasid_setup_pass_through(struct intel_iommu *iommu, struct dmar_domain *domain, struct device *dev, u32 pasid); +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, + u32 pasid, struct dmar_domain *domain); void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev, u32 pasid, bool fault_ignore);