Message ID | 20231018160733.24655-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.18,v2] iommu/vt-d: fix SAGAW capability parsing | expand |
On 18/10/2023 5:07 pm, Roger Pau Monne wrote: > Enabling 5 level paging support (AGAW 3) at this point in the release is too > risky, This comment will go stale extremely quickly. "It is too late in the Xen 4.18 release to enable support for 5 level paging, ..." ~Andrew > so instead put a bodge to unconditionally disable passthough mode if > SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the > specification but unlikely to have any meaning in the future.
On Wed, Oct 18, 2023 at 06:04:29PM +0100, Andrew Cooper wrote: > On 18/10/2023 5:07 pm, Roger Pau Monne wrote: > > Enabling 5 level paging support (AGAW 3) at this point in the release is too > > risky, > > This comment will go stale extremely quickly. > > "It is too late in the Xen 4.18 release to enable support for 5 level > paging, ..." I was hoping it would be close enough to the branching and tagging, but your wording removes that ambiguity, so that's better. Will wait for further comments. Thanks.
On 18.10.2023 18:07, Roger Pau Monne wrote: > SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5 > level page tables respectively. According to the Intel VT-d specification, an > IOMMU can report multiple SAGAW bits being set. > > Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but > it's actually replacing an open coded implementation to find the last set bit. > The change forces the used AGAW to the lowest supported by the IOMMU instead of > the highest one between 1 and 2. > > Restore the previous SAGAW parsing by using fls() instead of > find_first_set_bit(), in order to get the highest (supported) AGAW to be used. > > However there's a caveat related to the value the AW context entry field must > be set to when using passthrough mode: > > "When the Translation-type (TT) field indicates pass-through processing (10b), > this field must be programmed to indicate the largest AGAW value supported by > hardware." [0] > > Newer Intel IOMMU implementations support 5 level page tables for the IOMMU, > and signal such support in SAGAW bit 3. > > Enabling 5 level paging support (AGAW 3) at this point in the release is too > risky, so instead put a bodge to unconditionally disable passthough mode if > SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the > specification but unlikely to have any meaning in the future. May be worth mentioning that in earlier versions this indicated 2-level paging support. > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) > > /* Calculate number of pagetable levels: 3 or 4. */ > sagaw = cap_sagaw(iommu->cap); > - if ( sagaw & 6 ) > - agaw = find_first_set_bit(sagaw & 6); > - if ( !agaw ) > + agaw = fls(sagaw & 6) - 1; > + if ( agaw == -1 ) Would you mind making this "< 0" or even "<= 0"? The latter in particular would already cover the likely future change of dropping the masking by 6. > { > printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); > print_iommu_regs(drhd); > rc = -ENODEV; > goto free; > } > + if ( sagaw >> 3 ) > + { > + printk_once(XENLOG_WARNING VTDPREFIX > + "IOMMU: unhandled bits set in sagaw (%#x)%s\n", I think IOMMU: is redundant with VTDPREFIX (or alternatively iommu->index would also want logging). Also note that VTDPREFIX (bogusly) has no trailing space. (I realize both apply to the other log message in context as well, but still. I'd be inclined to adjust that at the same time, including switching to %#x as you have it in the new log message.) > + sagaw, > + iommu_hwdom_passthrough ? " disabling passthrough" : "" ); May want a leading comma (or some other separator) in the string. > + if ( iommu_hwdom_passthrough ) > + iommu_hwdom_passthrough = false; No real need for if() here. I'd be happy to adjust any of the mentioned items while committing, but of course I would first need to know which ones you agree with. Since all of them are cosmetic, either way Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On Thu, Oct 19, 2023 at 09:41:41AM +0200, Jan Beulich wrote: > On 18.10.2023 18:07, Roger Pau Monne wrote: > > SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5 > > level page tables respectively. According to the Intel VT-d specification, an > > IOMMU can report multiple SAGAW bits being set. > > > > Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but > > it's actually replacing an open coded implementation to find the last set bit. > > The change forces the used AGAW to the lowest supported by the IOMMU instead of > > the highest one between 1 and 2. > > > > Restore the previous SAGAW parsing by using fls() instead of > > find_first_set_bit(), in order to get the highest (supported) AGAW to be used. > > > > However there's a caveat related to the value the AW context entry field must > > be set to when using passthrough mode: > > > > "When the Translation-type (TT) field indicates pass-through processing (10b), > > this field must be programmed to indicate the largest AGAW value supported by > > hardware." [0] > > > > Newer Intel IOMMU implementations support 5 level page tables for the IOMMU, > > and signal such support in SAGAW bit 3. > > > > Enabling 5 level paging support (AGAW 3) at this point in the release is too > > risky, so instead put a bodge to unconditionally disable passthough mode if > > SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the > > specification but unlikely to have any meaning in the future. > > May be worth mentioning that in earlier versions this indicated 2-level > paging support. Oh, that's not even present in my copy of the spec from 2016. I guess it was removed very, very long time ago? > > --- a/xen/drivers/passthrough/vtd/iommu.c > > +++ b/xen/drivers/passthrough/vtd/iommu.c > > @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) > > > > /* Calculate number of pagetable levels: 3 or 4. */ > > sagaw = cap_sagaw(iommu->cap); > > - if ( sagaw & 6 ) > > - agaw = find_first_set_bit(sagaw & 6); > > - if ( !agaw ) > > + agaw = fls(sagaw & 6) - 1; > > + if ( agaw == -1 ) > > Would you mind making this "< 0" or even "<= 0"? The latter in particular > would already cover the likely future change of dropping the masking by 6. My plan wasn't to drop the masking, but use 0xe if we support AGAW 3. I'm fine with using < or <= if you think it's more robust. > > { > > printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); > > print_iommu_regs(drhd); > > rc = -ENODEV; > > goto free; > > } > > + if ( sagaw >> 3 ) > > + { > > + printk_once(XENLOG_WARNING VTDPREFIX > > + "IOMMU: unhandled bits set in sagaw (%#x)%s\n", > > I think IOMMU: is redundant with VTDPREFIX (or alternatively iommu->index > would also want logging). Also note that VTDPREFIX (bogusly) has no > trailing space. (I realize both apply to the other log message in context > as well, but still. I'd be inclined to adjust that at the same time, > including switching to %#x as you have it in the new log message.) Oh, I didn't realize VTDPREFIX had no trailing space. Since it's a printk_once(), not sure iommu->index is really useful here, as we would report just one IOMMU has having an unhandled SAGAW. IMO if we switch to printing iommu->index we must also use a plain printk. But I don't see a lot of benefit in printing this for likely each IOMMU on the system, and hence I would rather use printk_once() and not print the index. Feel free to drop the IOMMU prefix, but I'm not sure what to do with VTDPREFIX and the missing trialing space, as some users of VTDPREFIX already account for such missing space. > > + sagaw, > > + iommu_hwdom_passthrough ? " disabling passthrough" : "" ); > > May want a leading comma (or some other separator) in the string. > > > + if ( iommu_hwdom_passthrough ) > > + iommu_hwdom_passthrough = false; > > No real need for if() here. Not really, but also no need for a write to iommu_hwdom_passthrough every time an IOMMU is initialized if the condition is removed. > I'd be happy to adjust any of the mentioned items while committing, but > of course I would first need to know which ones you agree with. Since all > of them are cosmetic, either way > Reviewed-by: Jan Beulich <jbeulich@suse.com> Let me know what you think re the log message. Thanks, Roger.
On 19.10.2023 10:15, Roger Pau Monné wrote: > On Thu, Oct 19, 2023 at 09:41:41AM +0200, Jan Beulich wrote: >> On 18.10.2023 18:07, Roger Pau Monne wrote: >>> SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5 >>> level page tables respectively. According to the Intel VT-d specification, an >>> IOMMU can report multiple SAGAW bits being set. >>> >>> Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but >>> it's actually replacing an open coded implementation to find the last set bit. >>> The change forces the used AGAW to the lowest supported by the IOMMU instead of >>> the highest one between 1 and 2. >>> >>> Restore the previous SAGAW parsing by using fls() instead of >>> find_first_set_bit(), in order to get the highest (supported) AGAW to be used. >>> >>> However there's a caveat related to the value the AW context entry field must >>> be set to when using passthrough mode: >>> >>> "When the Translation-type (TT) field indicates pass-through processing (10b), >>> this field must be programmed to indicate the largest AGAW value supported by >>> hardware." [0] >>> >>> Newer Intel IOMMU implementations support 5 level page tables for the IOMMU, >>> and signal such support in SAGAW bit 3. >>> >>> Enabling 5 level paging support (AGAW 3) at this point in the release is too >>> risky, so instead put a bodge to unconditionally disable passthough mode if >>> SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the >>> specification but unlikely to have any meaning in the future. >> >> May be worth mentioning that in earlier versions this indicated 2-level >> paging support. > > Oh, that's not even present in my copy of the spec from 2016. I guess > it was removed very, very long time ago? Indeed, as mentioned in the commit you're fixing. Version 1.3 of the spec still has it. Judging by the document numbers 2.2 may have been its direct successor (i.e. no further 1.x and no 2.0 or 2.1). >>> --- a/xen/drivers/passthrough/vtd/iommu.c >>> +++ b/xen/drivers/passthrough/vtd/iommu.c >>> @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) >>> >>> /* Calculate number of pagetable levels: 3 or 4. */ >>> sagaw = cap_sagaw(iommu->cap); >>> - if ( sagaw & 6 ) >>> - agaw = find_first_set_bit(sagaw & 6); >>> - if ( !agaw ) >>> + agaw = fls(sagaw & 6) - 1; >>> + if ( agaw == -1 ) >> >> Would you mind making this "< 0" or even "<= 0"? The latter in particular >> would already cover the likely future change of dropping the masking by 6. > > My plan wasn't to drop the masking, but use 0xe if we support AGAW 3. But we will also need to deal with bit 4 (at which point applying a mask is going to be useless code). We can either guess that it's going to mean 6-level paging, or we need to treat it as having unknown meaning when set (implying that we'd then still need to either fail initialization or disable pass-through mode). > I'm fine with using < or <= if you think it's more robust. Good, will do so then. >>> { >>> printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); >>> print_iommu_regs(drhd); >>> rc = -ENODEV; >>> goto free; >>> } >>> + if ( sagaw >> 3 ) >>> + { >>> + printk_once(XENLOG_WARNING VTDPREFIX >>> + "IOMMU: unhandled bits set in sagaw (%#x)%s\n", >> >> I think IOMMU: is redundant with VTDPREFIX (or alternatively iommu->index >> would also want logging). Also note that VTDPREFIX (bogusly) has no >> trailing space. (I realize both apply to the other log message in context >> as well, but still. I'd be inclined to adjust that at the same time, >> including switching to %#x as you have it in the new log message.) > > Oh, I didn't realize VTDPREFIX had no trailing space. > > Since it's a printk_once(), not sure iommu->index is really useful > here, as we would report just one IOMMU has having an unhandled SAGAW. > IMO if we switch to printing iommu->index we must also use a plain > printk. But I don't see a lot of benefit in printing this for likely > each IOMMU on the system, and hence I would rather use printk_once() > and not print the index. Well, logging the index in printk_once() has the benefit of identifying the first IOMMU with the issue, which may help further analysis if not all of them have bits beyond 2 set. But I'm not going to insist on this aspect. > Feel free to drop the IOMMU prefix, but I'm not sure what to do with > VTDPREFIX and the missing trialing space, as some users of VTDPREFIX > already account for such missing space. I'd simply insert a leading space in the string literal. >>> + sagaw, >>> + iommu_hwdom_passthrough ? " disabling passthrough" : "" ); >> >> May want a leading comma (or some other separator) in the string. >> >>> + if ( iommu_hwdom_passthrough ) >>> + iommu_hwdom_passthrough = false; >> >> No real need for if() here. > > Not really, but also no need for a write to iommu_hwdom_passthrough > every time an IOMMU is initialized if the condition is removed. This is init-time code, and hence the excess writes aren't going to be noticable. Jan
On Thu, Oct 19, 2023 at 10:49:29AM +0200, Jan Beulich wrote: > On 19.10.2023 10:15, Roger Pau Monné wrote: > > On Thu, Oct 19, 2023 at 09:41:41AM +0200, Jan Beulich wrote: > >> On 18.10.2023 18:07, Roger Pau Monne wrote: > >>> SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5 > >>> level page tables respectively. According to the Intel VT-d specification, an > >>> IOMMU can report multiple SAGAW bits being set. > >>> > >>> Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but > >>> it's actually replacing an open coded implementation to find the last set bit. > >>> The change forces the used AGAW to the lowest supported by the IOMMU instead of > >>> the highest one between 1 and 2. > >>> > >>> Restore the previous SAGAW parsing by using fls() instead of > >>> find_first_set_bit(), in order to get the highest (supported) AGAW to be used. > >>> > >>> However there's a caveat related to the value the AW context entry field must > >>> be set to when using passthrough mode: > >>> > >>> "When the Translation-type (TT) field indicates pass-through processing (10b), > >>> this field must be programmed to indicate the largest AGAW value supported by > >>> hardware." [0] > >>> > >>> Newer Intel IOMMU implementations support 5 level page tables for the IOMMU, > >>> and signal such support in SAGAW bit 3. > >>> > >>> Enabling 5 level paging support (AGAW 3) at this point in the release is too > >>> risky, so instead put a bodge to unconditionally disable passthough mode if > >>> SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the > >>> specification but unlikely to have any meaning in the future. > >> > >> May be worth mentioning that in earlier versions this indicated 2-level > >> paging support. > > > > Oh, that's not even present in my copy of the spec from 2016. I guess > > it was removed very, very long time ago? > > Indeed, as mentioned in the commit you're fixing. Version 1.3 of the > spec still has it. Judging by the document numbers 2.2 may have been > its direct successor (i.e. no further 1.x and no 2.0 or 2.1). > > >>> --- a/xen/drivers/passthrough/vtd/iommu.c > >>> +++ b/xen/drivers/passthrough/vtd/iommu.c > >>> @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) > >>> > >>> /* Calculate number of pagetable levels: 3 or 4. */ > >>> sagaw = cap_sagaw(iommu->cap); > >>> - if ( sagaw & 6 ) > >>> - agaw = find_first_set_bit(sagaw & 6); > >>> - if ( !agaw ) > >>> + agaw = fls(sagaw & 6) - 1; > >>> + if ( agaw == -1 ) > >> > >> Would you mind making this "< 0" or even "<= 0"? The latter in particular > >> would already cover the likely future change of dropping the masking by 6. > > > > My plan wasn't to drop the masking, but use 0xe if we support AGAW 3. > > But we will also need to deal with bit 4 (at which point applying a mask > is going to be useless code). We can either guess that it's going to mean > 6-level paging, or we need to treat it as having unknown meaning when set > (implying that we'd then still need to either fail initialization or > disable pass-through mode). I wouldn't enable support for AGAW 4 unless we have a way to test it. It's safer to just disable passthrough mode if SAGAW bit 4 is set. > > I'm fine with using < or <= if you think it's more robust. > > Good, will do so then. > > >>> { > >>> printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); > >>> print_iommu_regs(drhd); > >>> rc = -ENODEV; > >>> goto free; > >>> } > >>> + if ( sagaw >> 3 ) > >>> + { > >>> + printk_once(XENLOG_WARNING VTDPREFIX > >>> + "IOMMU: unhandled bits set in sagaw (%#x)%s\n", > >> > >> I think IOMMU: is redundant with VTDPREFIX (or alternatively iommu->index > >> would also want logging). Also note that VTDPREFIX (bogusly) has no > >> trailing space. (I realize both apply to the other log message in context > >> as well, but still. I'd be inclined to adjust that at the same time, > >> including switching to %#x as you have it in the new log message.) > > > > Oh, I didn't realize VTDPREFIX had no trailing space. > > > > Since it's a printk_once(), not sure iommu->index is really useful > > here, as we would report just one IOMMU has having an unhandled SAGAW. > > IMO if we switch to printing iommu->index we must also use a plain > > printk. But I don't see a lot of benefit in printing this for likely > > each IOMMU on the system, and hence I would rather use printk_once() > > and not print the index. > > Well, logging the index in printk_once() has the benefit of identifying > the first IOMMU with the issue, which may help further analysis if not > all of them have bits beyond 2 set. But I'm not going to insist on this > aspect. > > > Feel free to drop the IOMMU prefix, but I'm not sure what to do with > > VTDPREFIX and the missing trialing space, as some users of VTDPREFIX > > already account for such missing space. > > I'd simply insert a leading space in the string literal. Ack. > >>> + sagaw, > >>> + iommu_hwdom_passthrough ? " disabling passthrough" : "" ); > >> > >> May want a leading comma (or some other separator) in the string. > >> > >>> + if ( iommu_hwdom_passthrough ) > >>> + iommu_hwdom_passthrough = false; > >> > >> No real need for if() here. > > > > Not really, but also no need for a write to iommu_hwdom_passthrough > > every time an IOMMU is initialized if the condition is removed. > > This is init-time code, and hence the excess writes aren't going to be > noticable. I don't have a strong opinion, TBH I used to have it without the conditional and added it later. If you prefer to drop the conditional I won't oppose. Thanks, Roger.
Hi Roger, Jan, > On Oct 19, 2023, at 15:41, Jan Beulich <jbeulich@suse.com> wrote: > > On 18.10.2023 18:07, Roger Pau Monne wrote: > > I'd be happy to adjust any of the mentioned items while committing, but > of course I would first need to know which ones you agree with. Since all > of them are cosmetic, either way > Reviewed-by: Jan Beulich <jbeulich@suse.com> I am not sure if there is a v3 or the v2 will be committed with some committers handling the comments, but either way I am ok with the release-ack tag below: Release-acked-by: Henry Wang <Henry.Wang@arm.com> Kind regards, Henry > > Jan
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index ceef7359e553..d2211ecc0b1b 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1327,15 +1327,24 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) /* Calculate number of pagetable levels: 3 or 4. */ sagaw = cap_sagaw(iommu->cap); - if ( sagaw & 6 ) - agaw = find_first_set_bit(sagaw & 6); - if ( !agaw ) + agaw = fls(sagaw & 6) - 1; + if ( agaw == -1 ) { printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported sagaw %x\n", sagaw); print_iommu_regs(drhd); rc = -ENODEV; goto free; } + if ( sagaw >> 3 ) + { + printk_once(XENLOG_WARNING VTDPREFIX + "IOMMU: unhandled bits set in sagaw (%#x)%s\n", + sagaw, + iommu_hwdom_passthrough ? " disabling passthrough" : "" ); + if ( iommu_hwdom_passthrough ) + iommu_hwdom_passthrough = false; + } + iommu->nr_pt_levels = agaw_to_level(agaw); if ( min_pt_levels > iommu->nr_pt_levels ) min_pt_levels = iommu->nr_pt_levels;
SAGAW is a bitmap field, with bits 1, 2 and 3 signaling support for 3, 4 and 5 level page tables respectively. According to the Intel VT-d specification, an IOMMU can report multiple SAGAW bits being set. Commit 859d11b27912 claims to replace the open-coded find_first_set_bit(), but it's actually replacing an open coded implementation to find the last set bit. The change forces the used AGAW to the lowest supported by the IOMMU instead of the highest one between 1 and 2. Restore the previous SAGAW parsing by using fls() instead of find_first_set_bit(), in order to get the highest (supported) AGAW to be used. However there's a caveat related to the value the AW context entry field must be set to when using passthrough mode: "When the Translation-type (TT) field indicates pass-through processing (10b), this field must be programmed to indicate the largest AGAW value supported by hardware." [0] Newer Intel IOMMU implementations support 5 level page tables for the IOMMU, and signal such support in SAGAW bit 3. Enabling 5 level paging support (AGAW 3) at this point in the release is too risky, so instead put a bodge to unconditionally disable passthough mode if SAGAW has any bits greater than 2 set. Ignore bit 0, it's reserved in the specification but unlikely to have any meaning in the future. Note the message about unhandled SAGAW bits being set is printed unconditionally, regardless of whether passthrough mode is enabled. This is done in order to easily notice IOMMU implementations with not yet supported SAGAW values. [0] Intel VT Directed Spec Rev 4.1 Fixes: 859d11b27912 ('VT-d: prune SAGAW recognition') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - Reword commit message - Put a bodge for SAGAW bit 3. --- xen/drivers/passthrough/vtd/iommu.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)