Message ID | db66edf2-ca66-4127-8920-ba55f4aee14e@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | AMD/IOMMU: re-work mode updating | expand |
On 06/11/2019 15:16, Jan Beulich wrote: > update_paging_mode() in the AMD IOMMU code expects to be invoked with > the PCI devices lock held. The check occurring only when the mode > actually needs updating, the violation of this rule by the majority > of callers did go unnoticed until per-domain IOMMU setup was changed > to do away with on-demand creation of IOMMU page tables. > > Unfortunately the only half way reasonable fix to this that I could > come up with requires more re-work than would seem desirable at this > time of the release process, but addressing the issue seems > unavoidable to me as its manifestation is a regression from the > IOMMU page table setup re-work. The change also isn't without risk > of further regressions - if in patch 2 I've missed a code path that > would also need to invoke the new hook, then this might mean non- > working guests (with passed-through devices on AMD hardware). > > 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page > 2: introduce GFN notification for translated domains > 3: AMD/IOMMU: use notify_dfn() hook to update paging mode Having now looked at all three, why don't we just delete the dynamic height of AMD IOMMU pagetables? This series looks suspiciously like it is adding new common infrastructure to work around the fact we're doing something fairly dumb to being with. Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain, and a substantial reduction in the complexity of the IOMMU code. ~Andrew
On 06/11/2019 16:16, Jan Beulich wrote: > update_paging_mode() in the AMD IOMMU code expects to be invoked with > the PCI devices lock held. The check occurring only when the mode > actually needs updating, the violation of this rule by the majority > of callers did go unnoticed until per-domain IOMMU setup was changed > to do away with on-demand creation of IOMMU page tables. > > Unfortunately the only half way reasonable fix to this that I could > come up with requires more re-work than would seem desirable at this > time of the release process, but addressing the issue seems > unavoidable to me as its manifestation is a regression from the > IOMMU page table setup re-work. The change also isn't without risk > of further regressions - if in patch 2 I've missed a code path that > would also need to invoke the new hook, then this might mean non- > working guests (with passed-through devices on AMD hardware). > > 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page > 2: introduce GFN notification for translated domains > 3: AMD/IOMMU: use notify_dfn() hook to update paging mode > > Jan > Hi Jan, I just tested and I don't get the "pcidevs" message any more. I assume this only was a fix for that issue, so it's probably expected that the other issue: AMD-Vi: INVALID_DEV_REQUEST 00000800 8a000000 f8000840 000000fd and malfunctioning device in one of the guests. is still around. -- Sander
On 06.11.2019 19:29, Sander Eikelenboom wrote: > On 06/11/2019 16:16, Jan Beulich wrote: >> update_paging_mode() in the AMD IOMMU code expects to be invoked with >> the PCI devices lock held. The check occurring only when the mode >> actually needs updating, the violation of this rule by the majority >> of callers did go unnoticed until per-domain IOMMU setup was changed >> to do away with on-demand creation of IOMMU page tables. >> >> Unfortunately the only half way reasonable fix to this that I could >> come up with requires more re-work than would seem desirable at this >> time of the release process, but addressing the issue seems >> unavoidable to me as its manifestation is a regression from the >> IOMMU page table setup re-work. The change also isn't without risk >> of further regressions - if in patch 2 I've missed a code path that >> would also need to invoke the new hook, then this might mean non- >> working guests (with passed-through devices on AMD hardware). >> >> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page >> 2: introduce GFN notification for translated domains >> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode > > I just tested and I don't get the "pcidevs" message any more. Thanks for testing the series. > I assume this only was a fix for that issue, so it's probably expected > that the other issue: > AMD-Vi: INVALID_DEV_REQUEST 00000800 8a000000 f8000840 000000fd > and malfunctioning device in one of the guests. > is still around. Indeed. Someone (possibly me) still needs to look into the other one. Jan
On 06.11.2019 18:31, Andrew Cooper wrote: > On 06/11/2019 15:16, Jan Beulich wrote: >> update_paging_mode() in the AMD IOMMU code expects to be invoked with >> the PCI devices lock held. The check occurring only when the mode >> actually needs updating, the violation of this rule by the majority >> of callers did go unnoticed until per-domain IOMMU setup was changed >> to do away with on-demand creation of IOMMU page tables. >> >> Unfortunately the only half way reasonable fix to this that I could >> come up with requires more re-work than would seem desirable at this >> time of the release process, but addressing the issue seems >> unavoidable to me as its manifestation is a regression from the >> IOMMU page table setup re-work. The change also isn't without risk >> of further regressions - if in patch 2 I've missed a code path that >> would also need to invoke the new hook, then this might mean non- >> working guests (with passed-through devices on AMD hardware). >> >> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page >> 2: introduce GFN notification for translated domains >> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode > > Having now looked at all three, why don't we just delete the dynamic > height of AMD IOMMU pagetables? > > This series looks suspiciously like it is adding new common > infrastructure to work around the fact we're doing something fairly dumb > to being with. > > Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain, > and a substantial reduction in the complexity of the IOMMU code. Yet an additional level of page walks hardware has to perform. Also 4 levels won't cover all possible 52 address bits. And finally, the more applicable your "substantial reduction", the less suitable such a change may be at this point of the release (but I didn't look at this side of things in any detail, so it may well not be an issue). Jan
On 07/11/2019 07:36, Jan Beulich wrote: > On 06.11.2019 18:31, Andrew Cooper wrote: >> On 06/11/2019 15:16, Jan Beulich wrote: >>> update_paging_mode() in the AMD IOMMU code expects to be invoked with >>> the PCI devices lock held. The check occurring only when the mode >>> actually needs updating, the violation of this rule by the majority >>> of callers did go unnoticed until per-domain IOMMU setup was changed >>> to do away with on-demand creation of IOMMU page tables. >>> >>> Unfortunately the only half way reasonable fix to this that I could >>> come up with requires more re-work than would seem desirable at this >>> time of the release process, but addressing the issue seems >>> unavoidable to me as its manifestation is a regression from the >>> IOMMU page table setup re-work. The change also isn't without risk >>> of further regressions - if in patch 2 I've missed a code path that >>> would also need to invoke the new hook, then this might mean non- >>> working guests (with passed-through devices on AMD hardware). >>> >>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page >>> 2: introduce GFN notification for translated domains >>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode >> Having now looked at all three, why don't we just delete the dynamic >> height of AMD IOMMU pagetables? >> >> This series looks suspiciously like it is adding new common >> infrastructure to work around the fact we're doing something fairly dumb >> to being with. >> >> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain, >> and a substantial reduction in the complexity of the IOMMU code. > Yet an additional level of page walks hardware has to perform. Also > 4 levels won't cover all possible 52 address bits. And finally, the > more applicable your "substantial reduction", the less suitable such > a change may be at this point of the release (but I didn't look at > this side of things in any detail, so it may well not be an issue). There is, in practice, no such thing as an HVM guest using 2 levels. The VRAM just below the 4G boundary will force a resize to 3 levels during domain construction, and as a 1-line fix for 4.13, this probably isn't the worst idea going. There are no AMD systems which support >48 bit PA space, so 4 levels is sufficient for now, but fundamentally details such as the size of GPA space should be specified in domain_create() and remain static for the lifetime of the domain. As far as I can tell, it is only AMD systems with IOMMUs which permit the PA space to be variable, and I still can't help but feeling that this series is attempting to work around a problem we shouldn't have in the first place. ~Andrew
On 07.11.2019 13:49, Andrew Cooper wrote: > On 07/11/2019 07:36, Jan Beulich wrote: >> On 06.11.2019 18:31, Andrew Cooper wrote: >>> On 06/11/2019 15:16, Jan Beulich wrote: >>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with >>>> the PCI devices lock held. The check occurring only when the mode >>>> actually needs updating, the violation of this rule by the majority >>>> of callers did go unnoticed until per-domain IOMMU setup was changed >>>> to do away with on-demand creation of IOMMU page tables. >>>> >>>> Unfortunately the only half way reasonable fix to this that I could >>>> come up with requires more re-work than would seem desirable at this >>>> time of the release process, but addressing the issue seems >>>> unavoidable to me as its manifestation is a regression from the >>>> IOMMU page table setup re-work. The change also isn't without risk >>>> of further regressions - if in patch 2 I've missed a code path that >>>> would also need to invoke the new hook, then this might mean non- >>>> working guests (with passed-through devices on AMD hardware). >>>> >>>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page >>>> 2: introduce GFN notification for translated domains >>>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode >>> Having now looked at all three, why don't we just delete the dynamic >>> height of AMD IOMMU pagetables? >>> >>> This series looks suspiciously like it is adding new common >>> infrastructure to work around the fact we're doing something fairly dumb >>> to being with. >>> >>> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain, >>> and a substantial reduction in the complexity of the IOMMU code. >> Yet an additional level of page walks hardware has to perform. Also >> 4 levels won't cover all possible 52 address bits. And finally, the >> more applicable your "substantial reduction", the less suitable such >> a change may be at this point of the release (but I didn't look at >> this side of things in any detail, so it may well not be an issue). > > There is, in practice, no such thing as an HVM guest using 2 levels. > The VRAM just below the 4G boundary will force a resize to 3 levels > during domain construction, and as a 1-line fix for 4.13, this probably > isn't the worst idea going. So here (with the 1-line fix remark) you talk about 3 levels. Yet switching the 2 that we start from to 3 won't fix anything, as we still may need to go to 4 for huge guests. Such a change would merely eliminate the indeed pretty pointless move from 2 to 3 which now happens for all domains as their memory gets populated. > There are no AMD systems which support >48 bit PA space, so 4 levels is > sufficient for now, but fundamentally details such as the size of GPA > space should be specified in domain_create() and remain static for the > lifetime of the domain. I agree GPA dimensions ought to be static. But the number-of-levels adjustment the code does has nothing to do with variable GPA boundaries. Even for a domain with a, say, 36-bit GFN space it may be beneficial to run with only 3 levels, as long as it has "little" enough memory assigned. In fact the number of levels the hardware has to walk is the one aspect you don't even touch in your reply. Jan