Message ID | 20220204063459.680961-6-andr2000@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI devices passthrough on Arm, part 3 | expand |
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > pci_to_dev(pdev), flag); > } > > + rc = vpci_assign_device(d, pdev); > + > done: > if ( rc ) > printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", There's no attempt to undo anything in the case of getting back an error. ISTR this being deemed okay on the basis that the tool stack would then take whatever action, but whatever it is that is supposed to deal with errors here wants spelling out in the description. What's important is that no caller up the call tree may be left with the impression that the device is still owned by the original domain. With how you have it, the device is going to be owned by the new domain, but not really usable. > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) > > return rc; > } > + > +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT > +/* Notify vPCI that device is assigned to guest. */ > +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) > +{ > + int rc; > + > + if ( !has_vpci(d) ) > + return 0; > + > + rc = vpci_add_handlers(pdev); > + if ( rc ) > + vpci_deassign_device(d, pdev); > + > + return rc; > +} > + > +/* Notify vPCI that device is de-assigned from guest. */ > +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) > +{ > + if ( !has_vpci(d) ) > + return; > + > + vpci_remove_device(pdev); > +} > +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ While for the latter function you look to need two parameters, do you really need them also in the former one? Symmetry considerations make me wonder though whether the de-assign hook shouldn't be called earlier, when pdev->domain still has the original owner. At which point the 2nd parameter could disappear there as well. Jan
On 07.02.22 18:28, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >> pci_to_dev(pdev), flag); >> } >> >> + rc = vpci_assign_device(d, pdev); >> + >> done: >> if ( rc ) >> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", > There's no attempt to undo anything in the case of getting back an > error. ISTR this being deemed okay on the basis that the tool stack > would then take whatever action, but whatever it is that is supposed > to deal with errors here wants spelling out in the description. Why? I don't change the previously expected decision and implementation of the assign_device function: I use error paths as they were used before for the existing code. So, I see no clear reason to stress that the existing and new code relies on the toolstack > What's important is that no caller up the call tree may be left with > the impression that the device is still owned by the original > domain. With how you have it, the device is going to be owned by the > new domain, but not really usable. This is not true: vpci_assign_device will call vpci_deassign_device internally if it fails. So, the device won't be assigned in this case > >> --- a/xen/drivers/vpci/vpci.c >> +++ b/xen/drivers/vpci/vpci.c >> @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) >> >> return rc; >> } >> + >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >> +/* Notify vPCI that device is assigned to guest. */ >> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) >> +{ >> + int rc; >> + >> + if ( !has_vpci(d) ) >> + return 0; >> + >> + rc = vpci_add_handlers(pdev); >> + if ( rc ) >> + vpci_deassign_device(d, pdev); >> + >> + return rc; >> +} >> + >> +/* Notify vPCI that device is de-assigned from guest. */ >> +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) >> +{ >> + if ( !has_vpci(d) ) >> + return; >> + >> + vpci_remove_device(pdev); >> +} >> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ > While for the latter function you look to need two parameters, do you > really need them also in the former one? Do you mean instead of passing d we could just use pdev->domain? int vpci_assign_device(struct pci_dev *pdev) +{ + int rc; + + if ( !has_vpci(pdev->domain) ) + return 0; Yes, we probably can, but the rest of functions called from assign_device are accepting both d and pdev, so not sure why would we want these two be any different. Any good reason not to change others as well then? > Symmetry considerations make me wonder though whether the de-assign > hook shouldn't be called earlier, when pdev->domain still has the > original owner. At which point the 2nd parameter could disappear there > as well. static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) { [snip] vpci_deassign_device(pdev->domain, pdev); [snip] rc = vpci_assign_device(d, pdev); It looks ok to me > Jan > Thank you, Oleksandr
On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: > On 07.02.22 18:28, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>> pci_to_dev(pdev), flag); >>> } >>> >>> + rc = vpci_assign_device(d, pdev); >>> + >>> done: >>> if ( rc ) >>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >> There's no attempt to undo anything in the case of getting back an >> error. ISTR this being deemed okay on the basis that the tool stack >> would then take whatever action, but whatever it is that is supposed >> to deal with errors here wants spelling out in the description. > Why? I don't change the previously expected decision and implementation > of the assign_device function: I use error paths as they were used before > for the existing code. So, I see no clear reason to stress that the existing > and new code relies on the toolstack Saying half a sentence on this is helping review. >> What's important is that no caller up the call tree may be left with >> the impression that the device is still owned by the original >> domain. With how you have it, the device is going to be owned by the >> new domain, but not really usable. > This is not true: vpci_assign_device will call vpci_deassign_device > internally if it fails. So, the device won't be assigned in this case No. The device is assigned to whatever pdev->domain holds. Calling vpci_deassign_device() there merely makes sure that the device will have _no_ vPCI data and hooks in place, rather than something partial. >>> --- a/xen/drivers/vpci/vpci.c >>> +++ b/xen/drivers/vpci/vpci.c >>> @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) >>> >>> return rc; >>> } >>> + >>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >>> +/* Notify vPCI that device is assigned to guest. */ >>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) >>> +{ >>> + int rc; >>> + >>> + if ( !has_vpci(d) ) >>> + return 0; >>> + >>> + rc = vpci_add_handlers(pdev); >>> + if ( rc ) >>> + vpci_deassign_device(d, pdev); >>> + >>> + return rc; >>> +} >>> + >>> +/* Notify vPCI that device is de-assigned from guest. */ >>> +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) >>> +{ >>> + if ( !has_vpci(d) ) >>> + return; >>> + >>> + vpci_remove_device(pdev); >>> +} >>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ >> While for the latter function you look to need two parameters, do you >> really need them also in the former one? > Do you mean instead of passing d we could just use pdev->domain? > int vpci_assign_device(struct pci_dev *pdev) > +{ > + int rc; > + > + if ( !has_vpci(pdev->domain) ) > + return 0; Yes. > Yes, we probably can, but the rest of functions called from assign_device > are accepting both d and pdev, so not sure why would we want these > two be any different. Any good reason not to change others as well then? Yes: Prior to the call of the ->assign_device() hook, d != pdev->domain. It is the _purpose_ of this function to change ownership of the device. Jan
On 08.02.22 11:13, Jan Beulich wrote: > On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >> On 07.02.22 18:28, Jan Beulich wrote: >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>> pci_to_dev(pdev), flag); >>>> } >>>> >>>> + rc = vpci_assign_device(d, pdev); >>>> + >>>> done: >>>> if ( rc ) >>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>> There's no attempt to undo anything in the case of getting back an >>> error. ISTR this being deemed okay on the basis that the tool stack >>> would then take whatever action, but whatever it is that is supposed >>> to deal with errors here wants spelling out in the description. >> Why? I don't change the previously expected decision and implementation >> of the assign_device function: I use error paths as they were used before >> for the existing code. So, I see no clear reason to stress that the existing >> and new code relies on the toolstack > Saying half a sentence on this is helping review. Ok > >>> What's important is that no caller up the call tree may be left with >>> the impression that the device is still owned by the original >>> domain. With how you have it, the device is going to be owned by the >>> new domain, but not really usable. >> This is not true: vpci_assign_device will call vpci_deassign_device >> internally if it fails. So, the device won't be assigned in this case > No. The device is assigned to whatever pdev->domain holds. Calling > vpci_deassign_device() there merely makes sure that the device will > have _no_ vPCI data and hooks in place, rather than something > partial. So, this patch is only dealing with vpci assign/de-assign And it rolls back what it did in case of a failure It also returns rc in assign_device to signal it has failed What else is expected from this patch?? > >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>> >>>> return rc; >>>> } >>>> + >>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT >>>> +/* Notify vPCI that device is assigned to guest. */ >>>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) >>>> +{ >>>> + int rc; >>>> + >>>> + if ( !has_vpci(d) ) >>>> + return 0; >>>> + >>>> + rc = vpci_add_handlers(pdev); >>>> + if ( rc ) >>>> + vpci_deassign_device(d, pdev); >>>> + >>>> + return rc; >>>> +} >>>> + >>>> +/* Notify vPCI that device is de-assigned from guest. */ >>>> +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) >>>> +{ >>>> + if ( !has_vpci(d) ) >>>> + return; >>>> + >>>> + vpci_remove_device(pdev); >>>> +} >>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ >>> While for the latter function you look to need two parameters, do you >>> really need them also in the former one? >> Do you mean instead of passing d we could just use pdev->domain? >> int vpci_assign_device(struct pci_dev *pdev) >> +{ >> + int rc; >> + >> + if ( !has_vpci(pdev->domain) ) >> + return 0; > Yes. > >> Yes, we probably can, but the rest of functions called from assign_device >> are accepting both d and pdev, so not sure why would we want these >> two be any different. Any good reason not to change others as well then? > Yes: Prior to the call of the ->assign_device() hook, d != pdev->domain. > It is the _purpose_ of this function to change ownership of the device. This can be done and makes sense. @Roger which way do you want this? > > Jan > Thank you, Oleksandr
On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: > On 08.02.22 11:13, Jan Beulich wrote: >> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>> On 07.02.22 18:28, Jan Beulich wrote: >>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>> pci_to_dev(pdev), flag); >>>>> } >>>>> >>>>> + rc = vpci_assign_device(d, pdev); >>>>> + >>>>> done: >>>>> if ( rc ) >>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>> There's no attempt to undo anything in the case of getting back an >>>> error. ISTR this being deemed okay on the basis that the tool stack >>>> would then take whatever action, but whatever it is that is supposed >>>> to deal with errors here wants spelling out in the description. >>> Why? I don't change the previously expected decision and implementation >>> of the assign_device function: I use error paths as they were used before >>> for the existing code. So, I see no clear reason to stress that the existing >>> and new code relies on the toolstack >> Saying half a sentence on this is helping review. > Ok >> >>>> What's important is that no caller up the call tree may be left with >>>> the impression that the device is still owned by the original >>>> domain. With how you have it, the device is going to be owned by the >>>> new domain, but not really usable. >>> This is not true: vpci_assign_device will call vpci_deassign_device >>> internally if it fails. So, the device won't be assigned in this case >> No. The device is assigned to whatever pdev->domain holds. Calling >> vpci_deassign_device() there merely makes sure that the device will >> have _no_ vPCI data and hooks in place, rather than something >> partial. > So, this patch is only dealing with vpci assign/de-assign > And it rolls back what it did in case of a failure > It also returns rc in assign_device to signal it has failed > What else is expected from this patch?? Until now if assign_device() returns an error, this tells the caller that the device did not change ownership; in the worst case it either only moved to the quarantine domain, or the new owner may have been crashed. In no case is the device owned by an alive DomU. You're changing this property, and hence you need to make clear/sure that this isn't colliding with assumptions made elsewhere. Jan
On 08.02.22 11:44, Jan Beulich wrote: > On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: >> On 08.02.22 11:13, Jan Beulich wrote: >>> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>>> On 07.02.22 18:28, Jan Beulich wrote: >>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>>> pci_to_dev(pdev), flag); >>>>>> } >>>>>> >>>>>> + rc = vpci_assign_device(d, pdev); >>>>>> + >>>>>> done: >>>>>> if ( rc ) >>>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>> There's no attempt to undo anything in the case of getting back an >>>>> error. ISTR this being deemed okay on the basis that the tool stack >>>>> would then take whatever action, but whatever it is that is supposed >>>>> to deal with errors here wants spelling out in the description. >>>> Why? I don't change the previously expected decision and implementation >>>> of the assign_device function: I use error paths as they were used before >>>> for the existing code. So, I see no clear reason to stress that the existing >>>> and new code relies on the toolstack >>> Saying half a sentence on this is helping review. >> Ok >>>>> What's important is that no caller up the call tree may be left with >>>>> the impression that the device is still owned by the original >>>>> domain. With how you have it, the device is going to be owned by the >>>>> new domain, but not really usable. >>>> This is not true: vpci_assign_device will call vpci_deassign_device >>>> internally if it fails. So, the device won't be assigned in this case >>> No. The device is assigned to whatever pdev->domain holds. Calling >>> vpci_deassign_device() there merely makes sure that the device will >>> have _no_ vPCI data and hooks in place, rather than something >>> partial. >> So, this patch is only dealing with vpci assign/de-assign >> And it rolls back what it did in case of a failure >> It also returns rc in assign_device to signal it has failed >> What else is expected from this patch?? > Until now if assign_device() returns an error, this tells the caller > that the device did not change ownership; Not sure this is the case: if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_dev(pdev), flag)) ) iommu_call can leave the new ownership even now without vpci_assign_device. My understanding is that the roll-back is expected to be performed by the toolstack and vpci_assign_device doesn't prevent that by returning rc. Even more, before we discussed that it would be good for vpci_assign_device to try recovering from a possible error early which is done by calling vpci_deassign_device internally. So, if you want the things to be clearly handled without relying on the toolstack then it is not vpci_assign_device introduced issue, but the existing one, which needs (if there is a good reason) to be fixed separately. I think that new code doesn't make things worse. At least > in the worst case it either > only moved to the quarantine domain, or the new owner may have been > crashed. In no case is the device owned by an alive DomU. You're > changing this property, and hence you need to make clear/sure that > this isn't colliding with assumptions made elsewhere. > > Jan > >
On 08.02.2022 10:55, Oleksandr Andrushchenko wrote: > > > On 08.02.22 11:44, Jan Beulich wrote: >> On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: >>> On 08.02.22 11:13, Jan Beulich wrote: >>>> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>>>> On 07.02.22 18:28, Jan Beulich wrote: >>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>>>> pci_to_dev(pdev), flag); >>>>>>> } >>>>>>> >>>>>>> + rc = vpci_assign_device(d, pdev); >>>>>>> + >>>>>>> done: >>>>>>> if ( rc ) >>>>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>>> There's no attempt to undo anything in the case of getting back an >>>>>> error. ISTR this being deemed okay on the basis that the tool stack >>>>>> would then take whatever action, but whatever it is that is supposed >>>>>> to deal with errors here wants spelling out in the description. >>>>> Why? I don't change the previously expected decision and implementation >>>>> of the assign_device function: I use error paths as they were used before >>>>> for the existing code. So, I see no clear reason to stress that the existing >>>>> and new code relies on the toolstack >>>> Saying half a sentence on this is helping review. >>> Ok >>>>>> What's important is that no caller up the call tree may be left with >>>>>> the impression that the device is still owned by the original >>>>>> domain. With how you have it, the device is going to be owned by the >>>>>> new domain, but not really usable. >>>>> This is not true: vpci_assign_device will call vpci_deassign_device >>>>> internally if it fails. So, the device won't be assigned in this case >>>> No. The device is assigned to whatever pdev->domain holds. Calling >>>> vpci_deassign_device() there merely makes sure that the device will >>>> have _no_ vPCI data and hooks in place, rather than something >>>> partial. >>> So, this patch is only dealing with vpci assign/de-assign >>> And it rolls back what it did in case of a failure >>> It also returns rc in assign_device to signal it has failed >>> What else is expected from this patch?? >> Until now if assign_device() returns an error, this tells the caller >> that the device did not change ownership; > Not sure this is the case: > if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, > pci_to_dev(pdev), flag)) ) > iommu_call can leave the new ownership even now without > vpci_assign_device. Did you check the actual hook functions for when exactly the ownership change happens. For both VT-d and AMD it is the last thing they do, when no error can occur anymore. My understanding is that the roll-back is > expected to be performed by the toolstack and vpci_assign_device > doesn't prevent that by returning rc. Even more, before we discussed > that it would be good for vpci_assign_device to try recovering from > a possible error early which is done by calling vpci_deassign_device > internally. Yes, but that's only part of it. It at least needs considering what effects have resulted from operations prior to vpci_assign_device(). Jan
On 08.02.22 12:09, Jan Beulich wrote: > On 08.02.2022 10:55, Oleksandr Andrushchenko wrote: >> >> On 08.02.22 11:44, Jan Beulich wrote: >>> On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: >>>> On 08.02.22 11:13, Jan Beulich wrote: >>>>> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>>>>> On 07.02.22 18:28, Jan Beulich wrote: >>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>>>>> pci_to_dev(pdev), flag); >>>>>>>> } >>>>>>>> >>>>>>>> + rc = vpci_assign_device(d, pdev); >>>>>>>> + >>>>>>>> done: >>>>>>>> if ( rc ) >>>>>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>>>> There's no attempt to undo anything in the case of getting back an >>>>>>> error. ISTR this being deemed okay on the basis that the tool stack >>>>>>> would then take whatever action, but whatever it is that is supposed >>>>>>> to deal with errors here wants spelling out in the description. >>>>>> Why? I don't change the previously expected decision and implementation >>>>>> of the assign_device function: I use error paths as they were used before >>>>>> for the existing code. So, I see no clear reason to stress that the existing >>>>>> and new code relies on the toolstack >>>>> Saying half a sentence on this is helping review. >>>> Ok >>>>>>> What's important is that no caller up the call tree may be left with >>>>>>> the impression that the device is still owned by the original >>>>>>> domain. With how you have it, the device is going to be owned by the >>>>>>> new domain, but not really usable. >>>>>> This is not true: vpci_assign_device will call vpci_deassign_device >>>>>> internally if it fails. So, the device won't be assigned in this case >>>>> No. The device is assigned to whatever pdev->domain holds. Calling >>>>> vpci_deassign_device() there merely makes sure that the device will >>>>> have _no_ vPCI data and hooks in place, rather than something >>>>> partial. >>>> So, this patch is only dealing with vpci assign/de-assign >>>> And it rolls back what it did in case of a failure >>>> It also returns rc in assign_device to signal it has failed >>>> What else is expected from this patch?? >>> Until now if assign_device() returns an error, this tells the caller >>> that the device did not change ownership; >> Not sure this is the case: >> if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, >> pci_to_dev(pdev), flag)) ) >> iommu_call can leave the new ownership even now without >> vpci_assign_device. > Did you check the actual hook functions for when exactly the ownership > change happens. For both VT-d and AMD it is the last thing they do, > when no error can occur anymore. This functionality does not exist for Arm yet, so this is up to the future series to add that. WRT to the existing code: static int amd_iommu_assign_device(struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) { if ( !rc ) rc = reassign_device(pdev->domain, d, devfn, pdev); <<<<< this will set pdev->domain if ( rc && !is_hardware_domain(d) ) { int ret = amd_iommu_reserve_domain_unity_unmap( d, ivrs_mappings[req_id].unity_map); if ( ret ) { printk(XENLOG_ERR "AMD-Vi: " "unity-unmap for %pd/%04x:%02x:%02x.%u failed (%d)\n", d, pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret); domain_crash(d); } So.... This is IMO wrong in the first place to let IOMMU code assign pdev->domain. This is something that needs to be done by the PCI code itself and not relying on each IOMMU callback implementation > > My understanding is that the roll-back is >> expected to be performed by the toolstack and vpci_assign_device >> doesn't prevent that by returning rc. Even more, before we discussed >> that it would be good for vpci_assign_device to try recovering from >> a possible error early which is done by calling vpci_deassign_device >> internally. > Yes, but that's only part of it. It at least needs considering what > effects have resulted from operations prior to vpci_assign_device(). Taking into account the code snippet above: what is your expectation from this patch with this respect? > > Jan > Thank you, Oleksandr
On 08.02.2022 11:22, Oleksandr Andrushchenko wrote: > > > On 08.02.22 12:09, Jan Beulich wrote: >> On 08.02.2022 10:55, Oleksandr Andrushchenko wrote: >>> >>> On 08.02.22 11:44, Jan Beulich wrote: >>>> On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: >>>>> On 08.02.22 11:13, Jan Beulich wrote: >>>>>> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>>>>>> On 07.02.22 18:28, Jan Beulich wrote: >>>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>>>>>> pci_to_dev(pdev), flag); >>>>>>>>> } >>>>>>>>> >>>>>>>>> + rc = vpci_assign_device(d, pdev); >>>>>>>>> + >>>>>>>>> done: >>>>>>>>> if ( rc ) >>>>>>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>>>>> There's no attempt to undo anything in the case of getting back an >>>>>>>> error. ISTR this being deemed okay on the basis that the tool stack >>>>>>>> would then take whatever action, but whatever it is that is supposed >>>>>>>> to deal with errors here wants spelling out in the description. >>>>>>> Why? I don't change the previously expected decision and implementation >>>>>>> of the assign_device function: I use error paths as they were used before >>>>>>> for the existing code. So, I see no clear reason to stress that the existing >>>>>>> and new code relies on the toolstack >>>>>> Saying half a sentence on this is helping review. >>>>> Ok >>>>>>>> What's important is that no caller up the call tree may be left with >>>>>>>> the impression that the device is still owned by the original >>>>>>>> domain. With how you have it, the device is going to be owned by the >>>>>>>> new domain, but not really usable. >>>>>>> This is not true: vpci_assign_device will call vpci_deassign_device >>>>>>> internally if it fails. So, the device won't be assigned in this case >>>>>> No. The device is assigned to whatever pdev->domain holds. Calling >>>>>> vpci_deassign_device() there merely makes sure that the device will >>>>>> have _no_ vPCI data and hooks in place, rather than something >>>>>> partial. >>>>> So, this patch is only dealing with vpci assign/de-assign >>>>> And it rolls back what it did in case of a failure >>>>> It also returns rc in assign_device to signal it has failed >>>>> What else is expected from this patch?? >>>> Until now if assign_device() returns an error, this tells the caller >>>> that the device did not change ownership; >>> Not sure this is the case: >>> if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, >>> pci_to_dev(pdev), flag)) ) >>> iommu_call can leave the new ownership even now without >>> vpci_assign_device. >> Did you check the actual hook functions for when exactly the ownership >> change happens. For both VT-d and AMD it is the last thing they do, >> when no error can occur anymore. > This functionality does not exist for Arm yet, so this is up to the > future series to add that. > > WRT to the existing code: > > static int amd_iommu_assign_device(struct domain *d, u8 devfn, > struct pci_dev *pdev, > u32 flag) > { > if ( !rc ) > rc = reassign_device(pdev->domain, d, devfn, pdev); <<<<< this will set pdev->domain > > if ( rc && !is_hardware_domain(d) ) > { > int ret = amd_iommu_reserve_domain_unity_unmap( > d, ivrs_mappings[req_id].unity_map); > > if ( ret ) > { > printk(XENLOG_ERR "AMD-Vi: " > "unity-unmap for %pd/%04x:%02x:%02x.%u failed (%d)\n", > d, pdev->seg, pdev->bus, > PCI_SLOT(devfn), PCI_FUNC(devfn), ret); > domain_crash(d); > } > So.... > > This is IMO wrong in the first place to let IOMMU code assign pdev->domain. > This is something that needs to be done by the PCI code itself and > not relying on each IOMMU callback implementation >> >> My understanding is that the roll-back is >>> expected to be performed by the toolstack and vpci_assign_device >>> doesn't prevent that by returning rc. Even more, before we discussed >>> that it would be good for vpci_assign_device to try recovering from >>> a possible error early which is done by calling vpci_deassign_device >>> internally. >> Yes, but that's only part of it. It at least needs considering what >> effects have resulted from operations prior to vpci_assign_device(). > Taking into account the code snippet above: what is your expectation > from this patch with this respect? You did note the domain_crash() in there, didn't you? The snippet above still matches the "device not assigned to an alive DomU" criteria (which can be translated to "no exposure of a device to an untrusted entity in case of error"). Such domain_crash() uses aren't nice, and I'd prefer to see them go away, but said property needs to be retained with any alternative solutions. Jan
On 08.02.22 12:29, Jan Beulich wrote: > On 08.02.2022 11:22, Oleksandr Andrushchenko wrote: >> >> On 08.02.22 12:09, Jan Beulich wrote: >>> On 08.02.2022 10:55, Oleksandr Andrushchenko wrote: >>>> On 08.02.22 11:44, Jan Beulich wrote: >>>>> On 08.02.2022 10:27, Oleksandr Andrushchenko wrote: >>>>>> On 08.02.22 11:13, Jan Beulich wrote: >>>>>>> On 08.02.2022 09:32, Oleksandr Andrushchenko wrote: >>>>>>>> On 07.02.22 18:28, Jan Beulich wrote: >>>>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>>>>>> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>>>>>>> pci_to_dev(pdev), flag); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> + rc = vpci_assign_device(d, pdev); >>>>>>>>>> + >>>>>>>>>> done: >>>>>>>>>> if ( rc ) >>>>>>>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>>>>>> There's no attempt to undo anything in the case of getting back an >>>>>>>>> error. ISTR this being deemed okay on the basis that the tool stack >>>>>>>>> would then take whatever action, but whatever it is that is supposed >>>>>>>>> to deal with errors here wants spelling out in the description. >>>>>>>> Why? I don't change the previously expected decision and implementation >>>>>>>> of the assign_device function: I use error paths as they were used before >>>>>>>> for the existing code. So, I see no clear reason to stress that the existing >>>>>>>> and new code relies on the toolstack >>>>>>> Saying half a sentence on this is helping review. >>>>>> Ok >>>>>>>>> What's important is that no caller up the call tree may be left with >>>>>>>>> the impression that the device is still owned by the original >>>>>>>>> domain. With how you have it, the device is going to be owned by the >>>>>>>>> new domain, but not really usable. >>>>>>>> This is not true: vpci_assign_device will call vpci_deassign_device >>>>>>>> internally if it fails. So, the device won't be assigned in this case >>>>>>> No. The device is assigned to whatever pdev->domain holds. Calling >>>>>>> vpci_deassign_device() there merely makes sure that the device will >>>>>>> have _no_ vPCI data and hooks in place, rather than something >>>>>>> partial. >>>>>> So, this patch is only dealing with vpci assign/de-assign >>>>>> And it rolls back what it did in case of a failure >>>>>> It also returns rc in assign_device to signal it has failed >>>>>> What else is expected from this patch?? >>>>> Until now if assign_device() returns an error, this tells the caller >>>>> that the device did not change ownership; >>>> Not sure this is the case: >>>> if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn, >>>> pci_to_dev(pdev), flag)) ) >>>> iommu_call can leave the new ownership even now without >>>> vpci_assign_device. >>> Did you check the actual hook functions for when exactly the ownership >>> change happens. For both VT-d and AMD it is the last thing they do, >>> when no error can occur anymore. >> This functionality does not exist for Arm yet, so this is up to the >> future series to add that. >> >> WRT to the existing code: >> >> static int amd_iommu_assign_device(struct domain *d, u8 devfn, >> struct pci_dev *pdev, >> u32 flag) >> { >> if ( !rc ) >> rc = reassign_device(pdev->domain, d, devfn, pdev); <<<<< this will set pdev->domain >> >> if ( rc && !is_hardware_domain(d) ) >> { >> int ret = amd_iommu_reserve_domain_unity_unmap( >> d, ivrs_mappings[req_id].unity_map); >> >> if ( ret ) >> { >> printk(XENLOG_ERR "AMD-Vi: " >> "unity-unmap for %pd/%04x:%02x:%02x.%u failed (%d)\n", >> d, pdev->seg, pdev->bus, >> PCI_SLOT(devfn), PCI_FUNC(devfn), ret); >> domain_crash(d); >> } >> So.... >> >> This is IMO wrong in the first place to let IOMMU code assign pdev->domain. >> This is something that needs to be done by the PCI code itself and >> not relying on each IOMMU callback implementation >>> My understanding is that the roll-back is >>>> expected to be performed by the toolstack and vpci_assign_device >>>> doesn't prevent that by returning rc. Even more, before we discussed >>>> that it would be good for vpci_assign_device to try recovering from >>>> a possible error early which is done by calling vpci_deassign_device >>>> internally. >>> Yes, but that's only part of it. It at least needs considering what >>> effects have resulted from operations prior to vpci_assign_device(). >> Taking into account the code snippet above: what is your expectation >> from this patch with this respect? > You did note the domain_crash() in there, didn't you? Which is AMD specific implementation which can be different for other IOMMUs. Yes, I did. > The snippet above > still matches the "device not assigned to an alive DomU" criteria (which > can be translated to "no exposure of a device to an untrusted entity in > case of error"). Such domain_crash() uses aren't nice, and I'd prefer to > see them go away, but said property needs to be retained with any > alternative solutions. This smells like we first need to fix the existing code, so pdev->domain is not assigned by specific IOMMU implementations, but instead controlled by the code which relies on that, assign_device. I can have something like: diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 88836aab6baf..cc7790709a50 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) { const struct domain_iommu *hd = dom_iommu(d); + struct domain *old_owner; struct pci_dev *pdev; int rc = 0; @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); + /* We need to restore the old owner in case of an error. */ + old_owner = pdev->domain; + vpci_deassign_device(pdev->domain, pdev); rc = pdev_msix_assign(d, pdev); @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) done: if ( rc ) + { printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", d, &PCI_SBDF3(seg, bus, devfn), rc); + /* We failed to assign, so restore the previous owner. */ + pdev->domain = old_owner; + } /* The device is assigned to dom_io so mark it as quarantined */ else if ( d == dom_io ) pdev->quarantine = true; But I do not think this belongs to this patch > > Jan > Thank you, Oleksandr
On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: > This smells like we first need to fix the existing code, so > pdev->domain is not assigned by specific IOMMU implementations, > but instead controlled by the code which relies on that, assign_device. Feel free to come up with proposals how to cleanly do so. Moving the assignment to pdev->domain may even be possible now, but if you go back you may find that the code was quite different earlier on. > I can have something like: > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 88836aab6baf..cc7790709a50 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) > static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > { > const struct domain_iommu *hd = dom_iommu(d); > + struct domain *old_owner; > struct pci_dev *pdev; > int rc = 0; > > @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > ASSERT(pdev && (pdev->domain == hardware_domain || > pdev->domain == dom_io)); > > + /* We need to restore the old owner in case of an error. */ > + old_owner = pdev->domain; > + > vpci_deassign_device(pdev->domain, pdev); > > rc = pdev_msix_assign(d, pdev); > @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > > done: > if ( rc ) > + { > printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", > d, &PCI_SBDF3(seg, bus, devfn), rc); > + /* We failed to assign, so restore the previous owner. */ > + pdev->domain = old_owner; > + } > /* The device is assigned to dom_io so mark it as quarantined */ > else if ( d == dom_io ) > pdev->quarantine = true; > > But I do not think this belongs to this patch Indeed. Plus I'm sure you understand that it's not that simple. Assigning to pdev->domain is only the last step of assignment. Restoring the original owner would entail putting in place the original IOMMU table entries as well, which in turn can fail. Hence why you'll find a number of uses of domain_crash() in places where rolling back is far from easy. Jan
On 08.02.22 13:00, Jan Beulich wrote: > On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: >> This smells like we first need to fix the existing code, so >> pdev->domain is not assigned by specific IOMMU implementations, >> but instead controlled by the code which relies on that, assign_device. > Feel free to come up with proposals how to cleanly do so. Moving the > assignment to pdev->domain may even be possible now, but if you go > back you may find that the code was quite different earlier on. I do understand that as the code evolves new use cases bring new issues. > >> I can have something like: >> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> index 88836aab6baf..cc7790709a50 100644 >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) >> static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >> { >> const struct domain_iommu *hd = dom_iommu(d); >> + struct domain *old_owner; >> struct pci_dev *pdev; >> int rc = 0; >> >> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >> ASSERT(pdev && (pdev->domain == hardware_domain || >> pdev->domain == dom_io)); >> >> + /* We need to restore the old owner in case of an error. */ >> + old_owner = pdev->domain; >> + >> vpci_deassign_device(pdev->domain, pdev); >> >> rc = pdev_msix_assign(d, pdev); >> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >> >> done: >> if ( rc ) >> + { >> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >> d, &PCI_SBDF3(seg, bus, devfn), rc); >> + /* We failed to assign, so restore the previous owner. */ >> + pdev->domain = old_owner; >> + } >> /* The device is assigned to dom_io so mark it as quarantined */ >> else if ( d == dom_io ) >> pdev->quarantine = true; >> >> But I do not think this belongs to this patch > Indeed. Plus I'm sure you understand that it's not that simple. Assigning > to pdev->domain is only the last step of assignment. Restoring the original > owner would entail putting in place the original IOMMU table entries as > well, which in turn can fail. Hence why you'll find a number of uses of > domain_crash() in places where rolling back is far from easy. So, why don't we just rely on the toolstack to do the roll back then? This way we won't add new domain_crash() calls. I do understand though that we may live Xen in a wrong state though. So, do you think it is possible if we just call deassign_device from assign_device on the error path? This is just like I do in vpci_assign_device: I call vpci_deassign_device if the former fails. > Jan > Thank you, Oleksandr
On 08.02.22 13:25, Oleksandr Andrushchenko wrote: > > On 08.02.22 13:00, Jan Beulich wrote: >> On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: >>> This smells like we first need to fix the existing code, so >>> pdev->domain is not assigned by specific IOMMU implementations, >>> but instead controlled by the code which relies on that, assign_device. >> Feel free to come up with proposals how to cleanly do so. Moving the >> assignment to pdev->domain may even be possible now, but if you go >> back you may find that the code was quite different earlier on. > I do understand that as the code evolves new use cases bring > new issues. >>> I can have something like: >>> >>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>> index 88836aab6baf..cc7790709a50 100644 >>> --- a/xen/drivers/passthrough/pci.c >>> +++ b/xen/drivers/passthrough/pci.c >>> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) >>> static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>> { >>> const struct domain_iommu *hd = dom_iommu(d); >>> + struct domain *old_owner; >>> struct pci_dev *pdev; >>> int rc = 0; >>> >>> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>> ASSERT(pdev && (pdev->domain == hardware_domain || >>> pdev->domain == dom_io)); >>> >>> + /* We need to restore the old owner in case of an error. */ >>> + old_owner = pdev->domain; >>> + >>> vpci_deassign_device(pdev->domain, pdev); >>> >>> rc = pdev_msix_assign(d, pdev); >>> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>> >>> done: >>> if ( rc ) >>> + { >>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>> d, &PCI_SBDF3(seg, bus, devfn), rc); >>> + /* We failed to assign, so restore the previous owner. */ >>> + pdev->domain = old_owner; >>> + } >>> /* The device is assigned to dom_io so mark it as quarantined */ >>> else if ( d == dom_io ) >>> pdev->quarantine = true; >>> >>> But I do not think this belongs to this patch >> Indeed. Plus I'm sure you understand that it's not that simple. Assigning >> to pdev->domain is only the last step of assignment. Restoring the original >> owner would entail putting in place the original IOMMU table entries as >> well, which in turn can fail. Hence why you'll find a number of uses of >> domain_crash() in places where rolling back is far from easy. > So, why don't we just rely on the toolstack to do the roll back then? > This way we won't add new domain_crash() calls. > I do understand though that we may live Xen in a wrong state though. > So, do you think it is possible if we just call deassign_device from > assign_device on the error path? This is just like I do in vpci_assign_device: > I call vpci_deassign_device if the former fails. With the following addition: diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index c4ae22aeefcd..d6c00449193c 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1511,6 +1511,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) } rc = vpci_assign_device(pdev); + if ( rc ) + /* + * Ignore the return code as we want to preserve the one from the + * failed assign operation. + */ + deassign_device(d, seg, bus, devfn); done: if ( rc ) I see the following logs (PV Dom0): (XEN) assign_device seg 0 bus 3 devfn 0 (XEN) [VT-D]d[IO]:PCIe: unmap 0000:03:00.0 (XEN) [VT-D]d4:PCIe: map 0000:03:00.0 (XEN) assign_device vpci_assign rc -22 from d[IO] to d4 (XEN) deassign_device current d4 to d[IO] (XEN) [VT-D]d4:PCIe: unmap 0000:03:00.0 (XEN) [VT-D]d[IO]:PCIe: map 0000:03:00.0 (XEN) deassign_device ret 0 (XEN) d4: assign (0000:03:00.0) failed (-22) libxl: error: libxl_pci.c:1498:pci_add_dm_done: Domain 4:xc_assign_device failed: Invalid argument libxl: error: libxl_pci.c:1781:device_pci_add_done: Domain 4:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3) libxl: error: libxl_create.c:1895:domcreate_attach_devices: Domain 4:unable to add pci devices libxl: error: libxl_domain.c:1183:libxl__destroy_domid: Domain 4:Non-existant domain libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 4:Unable to destroy guest libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 4:Destruction of domain failed So, it seems to properly solve the issue with pdev->domain left set to the domain we couldn't create. @Jan, will this address your concern? Thank you, Oleksandr
On 10.02.2022 09:21, Oleksandr Andrushchenko wrote: > > > On 08.02.22 13:25, Oleksandr Andrushchenko wrote: >> >> On 08.02.22 13:00, Jan Beulich wrote: >>> On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: >>>> This smells like we first need to fix the existing code, so >>>> pdev->domain is not assigned by specific IOMMU implementations, >>>> but instead controlled by the code which relies on that, assign_device. >>> Feel free to come up with proposals how to cleanly do so. Moving the >>> assignment to pdev->domain may even be possible now, but if you go >>> back you may find that the code was quite different earlier on. >> I do understand that as the code evolves new use cases bring >> new issues. >>>> I can have something like: >>>> >>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>>> index 88836aab6baf..cc7790709a50 100644 >>>> --- a/xen/drivers/passthrough/pci.c >>>> +++ b/xen/drivers/passthrough/pci.c >>>> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) >>>> static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>> { >>>> const struct domain_iommu *hd = dom_iommu(d); >>>> + struct domain *old_owner; >>>> struct pci_dev *pdev; >>>> int rc = 0; >>>> >>>> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>> ASSERT(pdev && (pdev->domain == hardware_domain || >>>> pdev->domain == dom_io)); >>>> >>>> + /* We need to restore the old owner in case of an error. */ >>>> + old_owner = pdev->domain; >>>> + >>>> vpci_deassign_device(pdev->domain, pdev); >>>> >>>> rc = pdev_msix_assign(d, pdev); >>>> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>> >>>> done: >>>> if ( rc ) >>>> + { >>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>> d, &PCI_SBDF3(seg, bus, devfn), rc); >>>> + /* We failed to assign, so restore the previous owner. */ >>>> + pdev->domain = old_owner; >>>> + } >>>> /* The device is assigned to dom_io so mark it as quarantined */ >>>> else if ( d == dom_io ) >>>> pdev->quarantine = true; >>>> >>>> But I do not think this belongs to this patch >>> Indeed. Plus I'm sure you understand that it's not that simple. Assigning >>> to pdev->domain is only the last step of assignment. Restoring the original >>> owner would entail putting in place the original IOMMU table entries as >>> well, which in turn can fail. Hence why you'll find a number of uses of >>> domain_crash() in places where rolling back is far from easy. >> So, why don't we just rely on the toolstack to do the roll back then? >> This way we won't add new domain_crash() calls. >> I do understand though that we may live Xen in a wrong state though. >> So, do you think it is possible if we just call deassign_device from >> assign_device on the error path? This is just like I do in vpci_assign_device: >> I call vpci_deassign_device if the former fails. > With the following addition: > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index c4ae22aeefcd..d6c00449193c 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -1511,6 +1511,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) > } > > rc = vpci_assign_device(pdev); > + if ( rc ) > + /* > + * Ignore the return code as we want to preserve the one from the > + * failed assign operation. > + */ > + deassign_device(d, seg, bus, devfn); > > done: > if ( rc ) > > I see the following logs (PV Dom0): > > (XEN) assign_device seg 0 bus 3 devfn 0 > (XEN) [VT-D]d[IO]:PCIe: unmap 0000:03:00.0 > (XEN) [VT-D]d4:PCIe: map 0000:03:00.0 > (XEN) assign_device vpci_assign rc -22 from d[IO] to d4 > (XEN) deassign_device current d4 to d[IO] > (XEN) [VT-D]d4:PCIe: unmap 0000:03:00.0 > (XEN) [VT-D]d[IO]:PCIe: map 0000:03:00.0 > (XEN) deassign_device ret 0 > (XEN) d4: assign (0000:03:00.0) failed (-22) > libxl: error: libxl_pci.c:1498:pci_add_dm_done: Domain 4:xc_assign_device failed: Invalid argument > libxl: error: libxl_pci.c:1781:device_pci_add_done: Domain 4:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3) > libxl: error: libxl_create.c:1895:domcreate_attach_devices: Domain 4:unable to add pci devices > libxl: error: libxl_domain.c:1183:libxl__destroy_domid: Domain 4:Non-existant domain > libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 4:Unable to destroy guest > libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 4:Destruction of domain failed > > So, it seems to properly solve the issue with pdev->domain left > set to the domain we couldn't create. > > @Jan, will this address your concern? Partly: For one I'd have to think through what further implications there are from going this route. And then completely ignoring the return value is unlikely to be correct: You certainly want to retain the original error code for returning to the caller, but you can't leave the error unhandled. That's likely another case where the "best" choice is to crash the guest. Jan
On 10.02.22 11:22, Jan Beulich wrote: > On 10.02.2022 09:21, Oleksandr Andrushchenko wrote: >> >> On 08.02.22 13:25, Oleksandr Andrushchenko wrote: >>> On 08.02.22 13:00, Jan Beulich wrote: >>>> On 08.02.2022 11:52, Oleksandr Andrushchenko wrote: >>>>> This smells like we first need to fix the existing code, so >>>>> pdev->domain is not assigned by specific IOMMU implementations, >>>>> but instead controlled by the code which relies on that, assign_device. >>>> Feel free to come up with proposals how to cleanly do so. Moving the >>>> assignment to pdev->domain may even be possible now, but if you go >>>> back you may find that the code was quite different earlier on. >>> I do understand that as the code evolves new use cases bring >>> new issues. >>>>> I can have something like: >>>>> >>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >>>>> index 88836aab6baf..cc7790709a50 100644 >>>>> --- a/xen/drivers/passthrough/pci.c >>>>> +++ b/xen/drivers/passthrough/pci.c >>>>> @@ -1475,6 +1475,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn) >>>>> static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>> { >>>>> const struct domain_iommu *hd = dom_iommu(d); >>>>> + struct domain *old_owner; >>>>> struct pci_dev *pdev; >>>>> int rc = 0; >>>>> >>>>> @@ -1490,6 +1491,9 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>> ASSERT(pdev && (pdev->domain == hardware_domain || >>>>> pdev->domain == dom_io)); >>>>> >>>>> + /* We need to restore the old owner in case of an error. */ >>>>> + old_owner = pdev->domain; >>>>> + >>>>> vpci_deassign_device(pdev->domain, pdev); >>>>> >>>>> rc = pdev_msix_assign(d, pdev); >>>>> @@ -1515,8 +1519,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >>>>> >>>>> done: >>>>> if ( rc ) >>>>> + { >>>>> printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", >>>>> d, &PCI_SBDF3(seg, bus, devfn), rc); >>>>> + /* We failed to assign, so restore the previous owner. */ >>>>> + pdev->domain = old_owner; >>>>> + } >>>>> /* The device is assigned to dom_io so mark it as quarantined */ >>>>> else if ( d == dom_io ) >>>>> pdev->quarantine = true; >>>>> >>>>> But I do not think this belongs to this patch >>>> Indeed. Plus I'm sure you understand that it's not that simple. Assigning >>>> to pdev->domain is only the last step of assignment. Restoring the original >>>> owner would entail putting in place the original IOMMU table entries as >>>> well, which in turn can fail. Hence why you'll find a number of uses of >>>> domain_crash() in places where rolling back is far from easy. >>> So, why don't we just rely on the toolstack to do the roll back then? >>> This way we won't add new domain_crash() calls. >>> I do understand though that we may live Xen in a wrong state though. >>> So, do you think it is possible if we just call deassign_device from >>> assign_device on the error path? This is just like I do in vpci_assign_device: >>> I call vpci_deassign_device if the former fails. >> With the following addition: >> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> index c4ae22aeefcd..d6c00449193c 100644 >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -1511,6 +1511,12 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) >> } >> >> rc = vpci_assign_device(pdev); >> + if ( rc ) >> + /* >> + * Ignore the return code as we want to preserve the one from the >> + * failed assign operation. >> + */ >> + deassign_device(d, seg, bus, devfn); This needs devfn to be preserved as it can be modified by the loop above: for ( ; pdev->phantom_stride; rc = 0 ) { devfn += pdev->phantom_stride; >> >> done: >> if ( rc ) >> >> I see the following logs (PV Dom0): >> >> (XEN) assign_device seg 0 bus 3 devfn 0 >> (XEN) [VT-D]d[IO]:PCIe: unmap 0000:03:00.0 >> (XEN) [VT-D]d4:PCIe: map 0000:03:00.0 >> (XEN) assign_device vpci_assign rc -22 from d[IO] to d4 >> (XEN) deassign_device current d4 to d[IO] >> (XEN) [VT-D]d4:PCIe: unmap 0000:03:00.0 >> (XEN) [VT-D]d[IO]:PCIe: map 0000:03:00.0 >> (XEN) deassign_device ret 0 >> (XEN) d4: assign (0000:03:00.0) failed (-22) >> libxl: error: libxl_pci.c:1498:pci_add_dm_done: Domain 4:xc_assign_device failed: Invalid argument >> libxl: error: libxl_pci.c:1781:device_pci_add_done: Domain 4:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3) >> libxl: error: libxl_create.c:1895:domcreate_attach_devices: Domain 4:unable to add pci devices >> libxl: error: libxl_domain.c:1183:libxl__destroy_domid: Domain 4:Non-existant domain >> libxl: error: libxl_domain.c:1137:domain_destroy_callback: Domain 4:Unable to destroy guest >> libxl: error: libxl_domain.c:1064:domain_destroy_cb: Domain 4:Destruction of domain failed >> >> So, it seems to properly solve the issue with pdev->domain left >> set to the domain we couldn't create. >> >> @Jan, will this address your concern? > Partly: For one I'd have to think through what further implications there > are from going this route. And then completely ignoring the return value > is unlikely to be correct: You certainly want to retain the original > error code for returning to the caller, but you can't leave the error > unhandled. That's likely another case where the "best" choice is to crash > the guest. Ok, then I'll crash the domain... > > Jan > > Thank you, Oleksandr
diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig index db94393f47a6..780490cf8e39 100644 --- a/xen/drivers/Kconfig +++ b/xen/drivers/Kconfig @@ -15,4 +15,8 @@ source "drivers/video/Kconfig" config HAS_VPCI bool +config HAS_VPCI_GUEST_SUPPORT + bool + depends on HAS_VPCI + endmenu diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 50dec3bb73d0..88836aab6baf 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -943,6 +943,8 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, if ( ret ) goto out; + vpci_deassign_device(d, pdev); + if ( pdev->domain == hardware_domain ) pdev->quarantine = false; @@ -1488,6 +1490,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) ASSERT(pdev && (pdev->domain == hardware_domain || pdev->domain == dom_io)); + vpci_deassign_device(pdev->domain, pdev); + rc = pdev_msix_assign(d, pdev); if ( rc ) goto done; @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) pci_to_dev(pdev), flag); } + rc = vpci_assign_device(d, pdev); + done: if ( rc ) printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n", diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index f8a93e61c08f..4e774875fa04 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev) return rc; } + +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +/* Notify vPCI that device is assigned to guest. */ +int vpci_assign_device(struct domain *d, struct pci_dev *pdev) +{ + int rc; + + if ( !has_vpci(d) ) + return 0; + + rc = vpci_add_handlers(pdev); + if ( rc ) + vpci_deassign_device(d, pdev); + + return rc; +} + +/* Notify vPCI that device is de-assigned from guest. */ +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) +{ + if ( !has_vpci(d) ) + return; + + vpci_remove_device(pdev); +} +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */ + #endif /* __XEN__ */ static int vpci_register_cmp(const struct vpci_register *r1, diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index f2a7d82ce77b..246307e6f5d5 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -251,6 +251,21 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v) } #endif +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT +/* Notify vPCI that device is assigned/de-assigned to/from guest. */ +int vpci_assign_device(struct domain *d, struct pci_dev *pdev); +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev); +#else +static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev) +{ + return 0; +}; + +static inline void vpci_deassign_device(struct domain *d, struct pci_dev *pdev) +{ +}; +#endif + #endif /*