Message ID | bc61d9cd-5da4-e35c-fb21-eeba5ab6e529@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | IOMMU: make domctl handler tolerate NULL domain | expand |
On 19/04/2022 10:39, Jan Beulich wrote: > Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, > XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed > here, when the domctl was passed DOMID_INVALID. > > Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") > Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
On 19.04.22 11:39, Jan Beulich wrote: > Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, > XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed > here, when the domctl was passed DOMID_INVALID. > > Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") > Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
On 19/04/2022 10:39, Jan Beulich wrote: > Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, > XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed > here, when the domctl was passed DOMID_INVALID. > > Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") > Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> I disagree with the Reported-by tag here. At best, it's "also noticed while investigating". Furthermore, under what circumstances is test_assign_device legitimate when passing DOMID_INVALID ? This has been broken for 3 years now without report, so it's clearly an unused codepath under both xl's and xapi's idea of passthrough. Finally, as I said in Juergen's email. The root cause of the bug reported is that non-IOMMMU subops are ending up here. That needs fixing at the caller to iommu_do_domctl(), not inside it. ~Andrew
On 19.04.2022 12:49, Andrew Cooper wrote: > On 19/04/2022 10:39, Jan Beulich wrote: >> Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, >> XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed >> here, when the domctl was passed DOMID_INVALID. >> >> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >> Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I disagree with the Reported-by tag here. At best, it's "also noticed > while investigating". One can view that way as well, sure. But this change alone would be sufficient to address the report. (As would be Jürgen's change alone.) > Furthermore, under what circumstances is test_assign_device legitimate > when passing DOMID_INVALID ? This has been broken for 3 years now > without report, so it's clearly an unused codepath under both xl's and > xapi's idea of passthrough. I guess xend had a way to drive the domctl this way. Iirc this was to find out whether a device is assignable at all, without needing to know of any particular valid domain. Jan
On 19/04/2022 11:59, Jan Beulich wrote: > On 19.04.2022 12:49, Andrew Cooper wrote: >> On 19/04/2022 10:39, Jan Beulich wrote: >> Furthermore, under what circumstances is test_assign_device legitimate >> when passing DOMID_INVALID ? This has been broken for 3 years now >> without report, so it's clearly an unused codepath under both xl's and >> xapi's idea of passthrough. > I guess xend had a way to drive the domctl this way. Looking at the xend code, it always called with domid 0. I can't see any evidence that there has actually been a caller passing DOMID_INVALID, but this is an utter mess. > Iirc this was > to find out whether a device is assignable at all, without needing > to know of any particular valid domain. In a correctly architected IOMMU subsystem (which Xen most definitely does not have at this juncture), that question is "Does the device have an upstream IOMMU". Xen doesn't currently have a working idea of an x86 system with IOMMUs but not covering all devices. While such a system is unlikely to exist in reality, there are cases where quirks create asymmetric coverage. Either way, this is fully subsumed by the future work to assign IOMMU groups. Also at the moment, because Xen doesn't support per-device IOMMU contexts, another check not performed is whether this devices identity maps are compatible with the identity maps in the target domain. Extra complexity here is that it will not occur on a single system (Conflicting RMRRs/IVHDs on a single system is definitely a firmware bug, not an accurate description of the system); only with migration between systems where equivalent logical devices have differing identity requirements on different systems. I don't see anything useful the "with a domid" version gets you. ~Andrew
On 19/04/2022 10:39, Jan Beulich wrote: > Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, > XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed > here, when the domctl was passed DOMID_INVALID. > > Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") > Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -558,7 +558,7 @@ int iommu_do_domctl( > { > int ret = -ENODEV; > > - if ( !is_iommu_enabled(d) ) > + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) > return -EOPNOTSUPP; Having spent the better part of a day debugging this mess, this patch is plain broken. It depends on Juergen's "xen/iommu: cleanup iommu related domctl handling" patch, because otherwise it erroneously fails non-IOMMU subops. ~Andrew
On 19.04.22 17:48, Andrew Cooper wrote: > On 19/04/2022 10:39, Jan Beulich wrote: >> Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, >> XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed >> here, when the domctl was passed DOMID_INVALID. >> >> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >> Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> --- a/xen/drivers/passthrough/iommu.c >> +++ b/xen/drivers/passthrough/iommu.c >> @@ -558,7 +558,7 @@ int iommu_do_domctl( >> { >> int ret = -ENODEV; >> >> - if ( !is_iommu_enabled(d) ) >> + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) >> return -EOPNOTSUPP; > > Having spent the better part of a day debugging this mess, this patch is > plain broken. > > It depends on Juergen's "xen/iommu: cleanup iommu related domctl > handling" patch, because otherwise it erroneously fails non-IOMMU subops. Which is not a real problem, as it is being called after all other subops didn't match. Or with my 3rd patch applied it is called only for IOMMU subops. Juergen
On 19/04/2022 16:52, Juergen Gross wrote: > On 19.04.22 17:48, Andrew Cooper wrote: >> On 19/04/2022 10:39, Jan Beulich wrote: >>> Besides the reporter's issue of hitting a NULL deref when >>> !CONFIG_GDBSX, >>> XEN_DOMCTL_test_assign_device can legitimately end up having NULL >>> passed >>> here, when the domctl was passed DOMID_INVALID. >>> >>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >>> Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> --- a/xen/drivers/passthrough/iommu.c >>> +++ b/xen/drivers/passthrough/iommu.c >>> @@ -558,7 +558,7 @@ int iommu_do_domctl( >>> { >>> int ret = -ENODEV; >>> - if ( !is_iommu_enabled(d) ) >>> + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) >>> return -EOPNOTSUPP; >> >> Having spent the better part of a day debugging this mess, this patch is >> plain broken. >> >> It depends on Juergen's "xen/iommu: cleanup iommu related domctl >> handling" patch, because otherwise it erroneously fails non-IOMMU >> subops. > > Which is not a real problem, as it is being called after all other > subops didn't match. It is a real problem even now, because it is bogus for the host or domain's IOMMU status to have any alteration to the XEN_DOMCTL_gdbsx_guestmemio path. The root cause of this bug is the existing XEN_DOMCTL_gdbsx_guestmemio case being compiled out in the !GDBSX case. It would be a more obvious problem if there were calls chained after iommu_do_domctl() in the arch_domctl() default: blocks, because then it wouldn't only be compiled-out functionality which hit this check. ~Andrew
On 19.04.2022 17:39, Andrew Cooper wrote: > On 19/04/2022 11:59, Jan Beulich wrote: >> On 19.04.2022 12:49, Andrew Cooper wrote: >>> On 19/04/2022 10:39, Jan Beulich wrote: >>> Furthermore, under what circumstances is test_assign_device legitimate >>> when passing DOMID_INVALID ? This has been broken for 3 years now >>> without report, so it's clearly an unused codepath under both xl's and >>> xapi's idea of passthrough. >> I guess xend had a way to drive the domctl this way. > > Looking at the xend code, it always called with domid 0. > > I can't see any evidence that there has actually been a caller passing > DOMID_INVALID, but this is an utter mess. > >> Iirc this was >> to find out whether a device is assignable at all, without needing >> to know of any particular valid domain. > > In a correctly architected IOMMU subsystem (which Xen most definitely > does not have at this juncture), that question is "Does the device have > an upstream IOMMU". > > Xen doesn't currently have a working idea of an x86 system with IOMMUs > but not covering all devices. While such a system is unlikely to exist > in reality, there are cases where quirks create asymmetric coverage. > Either way, this is fully subsumed by the future work to assign IOMMU > groups. > > Also at the moment, because Xen doesn't support per-device IOMMU > contexts, another check not performed is whether this devices identity > maps are compatible with the identity maps in the target domain. Extra > complexity here is that it will not occur on a single system > (Conflicting RMRRs/IVHDs on a single system is definitely a firmware > bug, not an accurate description of the system); only with migration > between systems where equivalent logical devices have differing identity > requirements on different systems. > > > I don't see anything useful the "with a domid" version gets you. Hence I guess someone thought allowing DOMID_INVALID there would be a good idea, irrespective of whether xend actually did things this way. Jan
On 19.04.2022 18:06, Andrew Cooper wrote: > On 19/04/2022 16:52, Juergen Gross wrote: >> On 19.04.22 17:48, Andrew Cooper wrote: >>> On 19/04/2022 10:39, Jan Beulich wrote: >>>> Besides the reporter's issue of hitting a NULL deref when >>>> !CONFIG_GDBSX, >>>> XEN_DOMCTL_test_assign_device can legitimately end up having NULL >>>> passed >>>> here, when the domctl was passed DOMID_INVALID. >>>> >>>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >>>> Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> --- a/xen/drivers/passthrough/iommu.c >>>> +++ b/xen/drivers/passthrough/iommu.c >>>> @@ -558,7 +558,7 @@ int iommu_do_domctl( >>>> { >>>> int ret = -ENODEV; >>>> - if ( !is_iommu_enabled(d) ) >>>> + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) >>>> return -EOPNOTSUPP; >>> >>> Having spent the better part of a day debugging this mess, this patch is >>> plain broken. >>> >>> It depends on Juergen's "xen/iommu: cleanup iommu related domctl >>> handling" patch, because otherwise it erroneously fails non-IOMMU >>> subops. >> >> Which is not a real problem, as it is being called after all other >> subops didn't match. > > It is a real problem even now, because it is bogus for the host or > domain's IOMMU status to have any alteration to the > XEN_DOMCTL_gdbsx_guestmemio path. The root cause of this bug is the > existing XEN_DOMCTL_gdbsx_guestmemio case being compiled out in the > !GDBSX case. I find your wording ("plain broken" in particular) irritating, to put it mildly. The change in behavior is that -EOPNOTSUPP may now be returned for the gdbsx operation instead of -ENOSYS. And that's when it would better have been -EOPNOTSUPP in the first place. Apart from this I don't see a dependency on Jürgen's patch, so please let me know if there's anything crucial I'm overlooking. Otherwise, with two R-b, I'm intending to put in the patch irrespective of your replies. > It would be a more obvious problem if there were calls chained after > iommu_do_domctl() in the arch_domctl() default: blocks, because then it > wouldn't only be compiled-out functionality which hit this check. But that's not the case. Jan
On 20/04/2022 06:52, Jan Beulich wrote: > On 19.04.2022 18:06, Andrew Cooper wrote: >> On 19/04/2022 16:52, Juergen Gross wrote: >>> On 19.04.22 17:48, Andrew Cooper wrote: >>>> On 19/04/2022 10:39, Jan Beulich wrote: >>>>> Besides the reporter's issue of hitting a NULL deref when >>>>> !CONFIG_GDBSX, >>>>> XEN_DOMCTL_test_assign_device can legitimately end up having NULL >>>>> passed >>>>> here, when the domctl was passed DOMID_INVALID. >>>>> >>>>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >>>>> Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> >>>>> --- a/xen/drivers/passthrough/iommu.c >>>>> +++ b/xen/drivers/passthrough/iommu.c >>>>> @@ -558,7 +558,7 @@ int iommu_do_domctl( >>>>> { >>>>> int ret = -ENODEV; >>>>> - if ( !is_iommu_enabled(d) ) >>>>> + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) >>>>> return -EOPNOTSUPP; >>>> Having spent the better part of a day debugging this mess, this patch is >>>> plain broken. >>>> >>>> It depends on Juergen's "xen/iommu: cleanup iommu related domctl >>>> handling" patch, because otherwise it erroneously fails non-IOMMU >>>> subops. >>> Which is not a real problem, as it is being called after all other >>> subops didn't match. >> It is a real problem even now, because it is bogus for the host or >> domain's IOMMU status to have any alteration to the >> XEN_DOMCTL_gdbsx_guestmemio path. The root cause of this bug is the >> existing XEN_DOMCTL_gdbsx_guestmemio case being compiled out in the >> !GDBSX case. > I find your wording ("plain broken" in particular) irritating, to put > it mildly. The change in behavior is that -EOPNOTSUPP may now be > returned for the gdbsx operation instead of -ENOSYS. It's not just gdbsx operations - it's every domctl subop whose case statement happens to get conditionally compiled out: XEN_DOMCTL_set_access_required XEN_DOMCTL_debug_op XEN_DOMCTL_mem_sharing_op XEN_DOMCTL_audit_p2m and every future domctl. I didn't trying reasoning about the differing populations of each arches arch_do_domctl(). > And that's when > it would better have been -EOPNOTSUPP in the first place. This irrelevant unless you have a time machine, or you can prove that the change doesn't break things. For the record, I didn't know about Juergen's discovery of 2 ENOSYS vs EOPNOTSUPP breakages in xen.git alone when writing the email. The mass, and spurious, change to almost 2^32 subops was enough to give pause for thought. >> It would be a more obvious problem if there were calls chained after >> iommu_do_domctl() in the arch_domctl() default: blocks, because then it >> wouldn't only be compiled-out functionality which hit this check. > But that's not the case. There is timebomb which just exploded on a user, and you've provided a patch claiming to defuse it, when all you have done is swap out one trigger for another. Specifically, you've replaced a latent bug (nothing actually calls test_assign_device with DOMID_INVALID) with a real error metastability for logic that really does care about ENOSYS vs EOPNOTSUPP. Yes - we should decide whether it ought be legal to call test_assign_device with DOMID_INVALID or not, and then write the reasoning down in the same patch which adjusts do_domctl() and/or iommu_do_domctl() to have consistent behaviour. But until iommu_do_domctl() is filtered to not operate on unrelated subops, making this change breaks more things than it fixes. ~Andrew
--- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -558,7 +558,7 @@ int iommu_do_domctl( { int ret = -ENODEV; - if ( !is_iommu_enabled(d) ) + if ( !(d ? is_iommu_enabled(d) : iommu_enabled) ) return -EOPNOTSUPP; #ifdef CONFIG_HAS_PCI
Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX, XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed here, when the domctl was passed DOMID_INVALID. Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>