Message ID | 1406230067-926-1-git-send-email-will.deacon@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 24 July 2014 20:27, Will Deacon <will.deacon@arm.com> wrote: > If the physical address of GICV isn't page-aligned, then we end up > creating a stage-2 mapping of the page containing it, which causes us to > map neighbouring memory locations directly into the guest. > > As an example, consider a platform with GICV at physical 0x2c02f000 > running a 64k-page host kernel. If qemu maps this into the guest at > 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will > map host physical region 0x2c020000 - 0x2c02efff. Accesses to these > physical regions may cause UNPREDICTABLE behaviour, for example, on the > Juno platform this will cause an SError exception to EL3, which brings > down the entire physical CPU resulting in RCU stalls / HYP panics / host > crashing / wasted weeks of debugging. This seems to me like a specific problem with Juno rather than an issue with having the GICV at a non-page-aligned start. The requirement to be able to expose host GICV as the guest GICC in a 64K pages system is just "nothing else in that 64K page (or pages, if the GICV runs across two pages) is allowed to be unsafe for the guest to touch", which remains true whether the GICV starts at 0K in the 64K page or 60K. > SBSA recommends that systems alias the 4k GICV across the bounding 64k > region, in which case GICV physical could be described as 0x2c020000 in > the above scenario. The SBSA "make every 4K region in the 64K page be the same thing" recommendation is one way of satisfying the requirement that the whole 64K page is safe for the guest to touch. (Making the rest of the page RAZ/WI would be another option I guess.) If your system actually implements the SBSA recommendation then in fact describing the GICV-phys-base as the 64K-aligned address is wrong, because then the register at GICV-base + 4K would not be the first register in the 2nd page of the GICV, it would be another copy of the 1st page. This happens to work on Linux guests currently because they don't touch anything in the 2nd page, but for cases like device passthrough IIRC we might well like the guest to use some of the 2nd page registers. So the only correct choice on those systems is to specify the +60K address as the GICV physaddr in the device tree, and use Marc's patchset to allow QEMU/kvmtool to determine the page offset within the 64K page so it can reflect that in the guest's device tree. I can't think of any way of determining whether a particular system gets this right or wrong automatically, which suggests perhaps we need to allow the device tree to specify that the GICV is 64k-page-safe... thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 24, 2014 at 08:47:23PM +0100, Peter Maydell wrote: > On 24 July 2014 20:27, Will Deacon <will.deacon@arm.com> wrote: > > If the physical address of GICV isn't page-aligned, then we end up > > creating a stage-2 mapping of the page containing it, which causes us to > > map neighbouring memory locations directly into the guest. > > > > As an example, consider a platform with GICV at physical 0x2c02f000 > > running a 64k-page host kernel. If qemu maps this into the guest at > > 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will > > map host physical region 0x2c020000 - 0x2c02efff. Accesses to these > > physical regions may cause UNPREDICTABLE behaviour, for example, on the > > Juno platform this will cause an SError exception to EL3, which brings > > down the entire physical CPU resulting in RCU stalls / HYP panics / host > > crashing / wasted weeks of debugging. > > This seems to me like a specific problem with Juno rather than an > issue with having the GICV at a non-page-aligned start. The > requirement to be able to expose host GICV as the guest GICC > in a 64K pages system is just "nothing else in that 64K page > (or pages, if the GICV runs across two pages) is allowed to be > unsafe for the guest to touch", which remains true whether the > GICV starts at 0K in the 64K page or 60K. I agree, and for that we would need a new ioctl so we can query the page-offset of the GICV on systems where it is safe. Given that such an ioctl doesn't exist today, I would like to plug the hole in mainline kernels with this patch, we can relax in the future if systems appear where it would be safe to map the entire 64k region. > > SBSA recommends that systems alias the 4k GICV across the bounding 64k > > region, in which case GICV physical could be described as 0x2c020000 in > > the above scenario. > > The SBSA "make every 4K region in the 64K page be the same thing" > recommendation is one way of satisfying the requirement that the > whole 64K page is safe for the guest to touch. (Making the rest of > the page RAZ/WI would be another option I guess.) If your system > actually implements the SBSA recommendation then in fact > describing the GICV-phys-base as the 64K-aligned address is wrong, > because then the register at GICV-base + 4K would not be > the first register in the 2nd page of the GICV, it would be another > copy of the 1st page. This happens to work on Linux guests > currently because they don't touch anything in the 2nd page, > but for cases like device passthrough IIRC we might well like > the guest to use some of the 2nd page registers. So the only > correct choice on those systems is to specify the +60K address > as the GICV physaddr in the device tree, and use Marc's patchset > to allow QEMU/kvmtool to determine the page offset within the 64K > page so it can reflect that in the guest's device tree. Again, that can be solved by introduced Marc's attr for determining the GICV offset within the 64k page. I don't think that's -stable material. > I can't think of any way of determining whether a particular > system gets this right or wrong automatically, which suggests > perhaps we need to allow the device tree to specify that the > GICV is 64k-page-safe... When we support such systems, I also think we'll need a device-tree change. My main concern right now is stopping the ability to hose the entire machine by trying to instantiate a virtual GIC. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/2014 02:47 PM, Peter Maydell wrote: > On 24 July 2014 20:27, Will Deacon <will.deacon@arm.com> wrote: >> If the physical address of GICV isn't page-aligned, then we end up >> creating a stage-2 mapping of the page containing it, which causes us to >> map neighbouring memory locations directly into the guest. >> >> As an example, consider a platform with GICV at physical 0x2c02f000 >> running a 64k-page host kernel. If qemu maps this into the guest at >> 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will >> map host physical region 0x2c020000 - 0x2c02efff. Accesses to these >> physical regions may cause UNPREDICTABLE behaviour, for example, on the >> Juno platform this will cause an SError exception to EL3, which brings >> down the entire physical CPU resulting in RCU stalls / HYP panics / host >> crashing / wasted weeks of debugging. > This seems to me like a specific problem with Juno rather than an > issue with having the GICV at a non-page-aligned start. The > requirement to be able to expose host GICV as the guest GICC > in a 64K pages system is just "nothing else in that 64K page > (or pages, if the GICV runs across two pages) is allowed to be > unsafe for the guest to touch", which remains true whether the > GICV starts at 0K in the 64K page or 60K. > >> SBSA recommends that systems alias the 4k GICV across the bounding 64k >> region, in which case GICV physical could be described as 0x2c020000 in >> the above scenario. > The SBSA "make every 4K region in the 64K page be the same thing" > recommendation is one way of satisfying the requirement that the > whole 64K page is safe for the guest to touch. (Making the rest of > the page RAZ/WI would be another option I guess.) If your system > actually implements the SBSA recommendation then in fact > describing the GICV-phys-base as the 64K-aligned address is wrong, > because then the register at GICV-base + 4K would not be > the first register in the 2nd page of the GICV, it would be another > copy of the 1st page. This happens to work on Linux guests > currently because they don't touch anything in the 2nd page, > but for cases like device passthrough IIRC we might well like > the guest to use some of the 2nd page registers. So the only > correct choice on those systems is to specify the +60K address > as the GICV physaddr in the device tree, and use Marc's patchset > to allow QEMU/kvmtool to determine the page offset within the 64K > page so it can reflect that in the guest's device tree. I have one of those systems specifying +60K address as the GICV physaddr and it works well for me with 64K pages and kvm with both QEMU and kvmtool. > > I can't think of any way of determining whether a particular > system gets this right or wrong automatically, which suggests > perhaps we need to allow the device tree to specify that the > GICV is 64k-page-safe... I don't have a better solution, despite my lack of enthusiasm for yet another device tree property. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/24/2014 02:55 PM, Will Deacon wrote: > On Thu, Jul 24, 2014 at 08:47:23PM +0100, Peter Maydell wrote: >> On 24 July 2014 20:27, Will Deacon <will.deacon@arm.com> wrote: >>> If the physical address of GICV isn't page-aligned, then we end up >>> creating a stage-2 mapping of the page containing it, which causes us to >>> map neighbouring memory locations directly into the guest. >>> >>> As an example, consider a platform with GICV at physical 0x2c02f000 >>> running a 64k-page host kernel. If qemu maps this into the guest at >>> 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will >>> map host physical region 0x2c020000 - 0x2c02efff. Accesses to these >>> physical regions may cause UNPREDICTABLE behaviour, for example, on the >>> Juno platform this will cause an SError exception to EL3, which brings >>> down the entire physical CPU resulting in RCU stalls / HYP panics / host >>> crashing / wasted weeks of debugging. >> This seems to me like a specific problem with Juno rather than an >> issue with having the GICV at a non-page-aligned start. The >> requirement to be able to expose host GICV as the guest GICC >> in a 64K pages system is just "nothing else in that 64K page >> (or pages, if the GICV runs across two pages) is allowed to be >> unsafe for the guest to touch", which remains true whether the >> GICV starts at 0K in the 64K page or 60K. > I agree, and for that we would need a new ioctl so we can query the > page-offset of the GICV on systems where it is safe. Given that such an > ioctl doesn't exist today, I would like to plug the hole in mainline kernels > with this patch, we can relax in the future if systems appear where it would > be safe to map the entire 64k region. I have such a system. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24 July 2014 20:55, Will Deacon <will.deacon@arm.com> wrote: > Again, that can be solved by introduced Marc's attr for determining the > GICV offset within the 64k page. I don't think that's -stable material. Agreed that we don't want to put Marc's patchset in -stable (and that without it systems with GICV in their host devicetree at pagebase+60K are unusable, so we're not actually regressing anything if we put this into stable). But... >> I can't think of any way of determining whether a particular >> system gets this right or wrong automatically, which suggests >> perhaps we need to allow the device tree to specify that the >> GICV is 64k-page-safe... > > When we support such systems, I also think we'll need a device-tree change. > My main concern right now is stopping the ability to hose the entire machine > by trying to instantiate a virtual GIC. ...I don't see how your patch prevents instantiating a VGIC and hosing the machine on a system where the 64K with the GICV registers in it goes [GICV registers] [machine blows up if you read this] 0K 8K 64K Whether the 64K page contains Bad Stuff is completely orthogonal to whether the device tree offset the host has for the GICV is 0K, 60K or anything in between. What you should be checking for is "is this system design broken?", which is probably a device tree attribute. thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 56ff9bebb577..fa9a95b3ed19 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1526,17 +1526,25 @@ int kvm_vgic_hyp_init(void) goto out_unmap; } - kvm_info("%s@%llx IRQ%d\n", vgic_node->name, - vctrl_res.start, vgic_maint_irq); - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); - if (of_address_to_resource(vgic_node, 3, &vcpu_res)) { kvm_err("Cannot obtain VCPU resource\n"); ret = -ENXIO; goto out_unmap; } + + if (!PAGE_ALIGNED(vcpu_res.start)) { + kvm_err("GICV physical address 0x%llx not page aligned\n", + (unsigned long long)vcpu_res.start); + ret = -ENXIO; + goto out_unmap; + } + vgic_vcpu_base = vcpu_res.start; + kvm_info("%s@%llx IRQ%d\n", vgic_node->name, + vctrl_res.start, vgic_maint_irq); + on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); + goto out; out_unmap:
If the physical address of GICV isn't page-aligned, then we end up creating a stage-2 mapping of the page containing it, which causes us to map neighbouring memory locations directly into the guest. As an example, consider a platform with GICV at physical 0x2c02f000 running a 64k-page host kernel. If qemu maps this into the guest at 0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will map host physical region 0x2c020000 - 0x2c02efff. Accesses to these physical regions may cause UNPREDICTABLE behaviour, for example, on the Juno platform this will cause an SError exception to EL3, which brings down the entire physical CPU resulting in RCU stalls / HYP panics / host crashing / wasted weeks of debugging. SBSA recommends that systems alias the 4k GICV across the bounding 64k region, in which case GICV physical could be described as 0x2c020000 in the above scenario. This patch fixes the problem by failing the vgic probe if the physical address of GICV isn't page-aligned. Note that this generated a warning in dmesg about freeing enabled IRQs, so I had to move the IRQ enabling later in the probe. Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joel Schopp <joel.schopp@amd.com> Cc: Don Dutile <ddutile@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com> --- Paulo, Gleb, This fixes a *really* nasty bug with 64k-page hosts and KVM. I believe Marc and Christoffer are both on holiday at the moment (not together), so could you please take this as an urgent fix? Without it, I can trivially bring down machines using kvm. I've checked that it applies cleanly against -next, so you shouldn't see any conflicts during the merge window. Thanks, Will virt/kvm/arm/vgic.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)