Message ID | 20211105065629.940943-7-andr2000@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PCI devices passthrough on Arm, part 3 | expand |
On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> > > Instead of handling a single range set, that contains all the memory > regions of all the BARs and ROM, have them per BAR. Iirc Roger did indicate agreement with the spitting. May I nevertheless ask that for posterity you say a word here about the overhead, to make clear this was a conscious decision? Jan
On 19.11.21 14:05, Jan Beulich wrote: > On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >> >> Instead of handling a single range set, that contains all the memory >> regions of all the BARs and ROM, have them per BAR. > Iirc Roger did indicate agreement with the spitting. May I nevertheless > ask that for posterity you say a word here about the overhead, to make > clear this was a conscious decision? Sure, but could you please help me with that sentence to please your eye? I mean that it was you seeing the overhead while I was not as to implement the similar functionality as range sets do I still think we'll duplicate range sets at the end of the day. > Jan > Thank you in advance, Oleksandr
On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: > On 19.11.21 14:05, Jan Beulich wrote: >> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>> >>> Instead of handling a single range set, that contains all the memory >>> regions of all the BARs and ROM, have them per BAR. >> Iirc Roger did indicate agreement with the spitting. May I nevertheless >> ask that for posterity you say a word here about the overhead, to make >> clear this was a conscious decision? > Sure, but could you please help me with that sentence to please your > eye? I mean that it was you seeing the overhead while I was not as > to implement the similar functionality as range sets do I still think we'll > duplicate range sets at the end of the day. "Note that rangesets were chosen here despite there being only up to <N> separate ranges in each set (typically just 1)." Albeit that's then still lacking a justification for the choice. Ease of implementation? As to overhead - did you compare sizeof(struct rangeset) + N * sizeof(struct range) with just N * sizeof(unsigned long [2])? Jan
On 19.11.21 14:45, Jan Beulich wrote: > On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: >> On 19.11.21 14:05, Jan Beulich wrote: >>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>>> >>>> Instead of handling a single range set, that contains all the memory >>>> regions of all the BARs and ROM, have them per BAR. >>> Iirc Roger did indicate agreement with the spitting. May I nevertheless >>> ask that for posterity you say a word here about the overhead, to make >>> clear this was a conscious decision? >> Sure, but could you please help me with that sentence to please your >> eye? I mean that it was you seeing the overhead while I was not as >> to implement the similar functionality as range sets do I still think we'll >> duplicate range sets at the end of the day. > "Note that rangesets were chosen here despite there being only up to > <N> separate ranges in each set (typically just 1)." Albeit that's > then still lacking a justification for the choice. Ease of > implementation? I guess yes. I'll put: "Note that rangesets were chosen here despite there being only up to <N> separate ranges in each set (typically just 1). But rangeset per BAR was chosen for the ease of implementation and existing code re-usability." > > As to overhead - did you compare sizeof(struct rangeset) + N * > sizeof(struct range) with just N * sizeof(unsigned long [2])? I was not thinking about data memory sizes in the first place, but new code introduced to handle that. And to be maintained. > > Jan > Thank you, Oleksandr
On 19.11.2021 13:50, Oleksandr Andrushchenko wrote: > On 19.11.21 14:45, Jan Beulich wrote: >> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: >>> On 19.11.21 14:05, Jan Beulich wrote: >>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>>>> >>>>> Instead of handling a single range set, that contains all the memory >>>>> regions of all the BARs and ROM, have them per BAR. >>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless >>>> ask that for posterity you say a word here about the overhead, to make >>>> clear this was a conscious decision? >>> Sure, but could you please help me with that sentence to please your >>> eye? I mean that it was you seeing the overhead while I was not as >>> to implement the similar functionality as range sets do I still think we'll >>> duplicate range sets at the end of the day. >> "Note that rangesets were chosen here despite there being only up to >> <N> separate ranges in each set (typically just 1)." Albeit that's >> then still lacking a justification for the choice. Ease of >> implementation? > I guess yes. I'll put: > > "Note that rangesets were chosen here despite there being only up to > <N> separate ranges in each set (typically just 1). But rangeset per BAR > was chosen for the ease of implementation and existing code re-usability." FTAOD please don't forget to replace the <N> - I wasn't sure if it would be 2 or 3. Also (nit) I don't think starting the 2nd sentence with "But ..." fits with the 1st sentence. Jan
On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: > @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) > INIT_LIST_HEAD(&pdev->vpci->handlers); > spin_lock_init(&pdev->vpci->lock); > > + header = &pdev->vpci->header; > + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) > + { > + struct vpci_bar *bar = &header->bars[i]; > + > + bar->mem = rangeset_new(NULL, NULL, 0); I don't recall why an anonymous range set was chosen back at the time when vPCI was first implemented, but I think this needs to be changed now that DomU-s get supported. Whether you do so right here or in a prereq patch is secondary to me. It may be desirable to exclude them from rangeset_domain_printk() (which would likely require a new RANGESETF_* flag), but I think such resources should be associated with their domains. Jan
On 19.11.21 15:06, Jan Beulich wrote: > On 19.11.2021 13:50, Oleksandr Andrushchenko wrote: >> On 19.11.21 14:45, Jan Beulich wrote: >>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: >>>> On 19.11.21 14:05, Jan Beulich wrote: >>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>>>>> >>>>>> Instead of handling a single range set, that contains all the memory >>>>>> regions of all the BARs and ROM, have them per BAR. >>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless >>>>> ask that for posterity you say a word here about the overhead, to make >>>>> clear this was a conscious decision? >>>> Sure, but could you please help me with that sentence to please your >>>> eye? I mean that it was you seeing the overhead while I was not as >>>> to implement the similar functionality as range sets do I still think we'll >>>> duplicate range sets at the end of the day. >>> "Note that rangesets were chosen here despite there being only up to >>> <N> separate ranges in each set (typically just 1)." Albeit that's >>> then still lacking a justification for the choice. Ease of >>> implementation? >> I guess yes. I'll put: >> >> "Note that rangesets were chosen here despite there being only up to >> <N> separate ranges in each set (typically just 1). But rangeset per BAR >> was chosen for the ease of implementation and existing code re-usability." > FTAOD please don't forget to replace the <N> - I wasn't sure if it would > be 2 or 3. It seems we can't put the exact number as it depends on how many MSI/MSI-X holes are there and that depends on an arbitrary device properties. > Also (nit) I don't think starting the 2nd sentence with "But > ..." fits with the 1st sentence. Sure, I will clean it up > > Jan > Thank you, Oleksandr
On 19.11.2021 14:19, Oleksandr Andrushchenko wrote: > > > On 19.11.21 15:06, Jan Beulich wrote: >> On 19.11.2021 13:50, Oleksandr Andrushchenko wrote: >>> On 19.11.21 14:45, Jan Beulich wrote: >>>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: >>>>> On 19.11.21 14:05, Jan Beulich wrote: >>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>>>>>> >>>>>>> Instead of handling a single range set, that contains all the memory >>>>>>> regions of all the BARs and ROM, have them per BAR. >>>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless >>>>>> ask that for posterity you say a word here about the overhead, to make >>>>>> clear this was a conscious decision? >>>>> Sure, but could you please help me with that sentence to please your >>>>> eye? I mean that it was you seeing the overhead while I was not as >>>>> to implement the similar functionality as range sets do I still think we'll >>>>> duplicate range sets at the end of the day. >>>> "Note that rangesets were chosen here despite there being only up to >>>> <N> separate ranges in each set (typically just 1)." Albeit that's >>>> then still lacking a justification for the choice. Ease of >>>> implementation? >>> I guess yes. I'll put: >>> >>> "Note that rangesets were chosen here despite there being only up to >>> <N> separate ranges in each set (typically just 1). But rangeset per BAR >>> was chosen for the ease of implementation and existing code re-usability." >> FTAOD please don't forget to replace the <N> - I wasn't sure if it would >> be 2 or 3. > It seems we can't put the exact number as it depends on how many MSI/MSI-X > holes are there and that depends on an arbitrary device properties. There aren't any MSI holes, and there can be at most 2 MSI-X holes iirc (MSI-X table and PBA). What I don't recall is whether there are constraints on these two, but istr them being fully independent. This would make the upper bound 3 (both in one BAR, other BARs then all using just a single range). Jan
On 19.11.21 15:29, Jan Beulich wrote: > On 19.11.2021 14:19, Oleksandr Andrushchenko wrote: >> >> On 19.11.21 15:06, Jan Beulich wrote: >>> On 19.11.2021 13:50, Oleksandr Andrushchenko wrote: >>>> On 19.11.21 14:45, Jan Beulich wrote: >>>>> On 19.11.2021 13:13, Oleksandr Andrushchenko wrote: >>>>>> On 19.11.21 14:05, Jan Beulich wrote: >>>>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> >>>>>>>> >>>>>>>> Instead of handling a single range set, that contains all the memory >>>>>>>> regions of all the BARs and ROM, have them per BAR. >>>>>>> Iirc Roger did indicate agreement with the spitting. May I nevertheless >>>>>>> ask that for posterity you say a word here about the overhead, to make >>>>>>> clear this was a conscious decision? >>>>>> Sure, but could you please help me with that sentence to please your >>>>>> eye? I mean that it was you seeing the overhead while I was not as >>>>>> to implement the similar functionality as range sets do I still think we'll >>>>>> duplicate range sets at the end of the day. >>>>> "Note that rangesets were chosen here despite there being only up to >>>>> <N> separate ranges in each set (typically just 1)." Albeit that's >>>>> then still lacking a justification for the choice. Ease of >>>>> implementation? >>>> I guess yes. I'll put: >>>> >>>> "Note that rangesets were chosen here despite there being only up to >>>> <N> separate ranges in each set (typically just 1). But rangeset per BAR >>>> was chosen for the ease of implementation and existing code re-usability." >>> FTAOD please don't forget to replace the <N> - I wasn't sure if it would >>> be 2 or 3. >> It seems we can't put the exact number as it depends on how many MSI/MSI-X >> holes are there and that depends on an arbitrary device properties. > There aren't any MSI holes, and there can be at most 2 MSI-X holes iirc > (MSI-X table and PBA). What I don't recall is whether there are > constraints on these two, but istr them being fully independent. This > would make the upper bound 3 (both in one BAR, other BARs then all using > just a single range). So if they are both in a single BAR (this is what I probably saw while running QEMU for PVH Dom0 tests), then we may have up to 3 range sets per BAR at max, so I will use 3 instead of N in description and will probably put some description how we came up with N == 3. > > Jan > Thank you!! Oleksandr
On 19.11.21 15:16, Jan Beulich wrote: > On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) >> INIT_LIST_HEAD(&pdev->vpci->handlers); >> spin_lock_init(&pdev->vpci->lock); >> >> + header = &pdev->vpci->header; >> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> + { >> + struct vpci_bar *bar = &header->bars[i]; >> + >> + bar->mem = rangeset_new(NULL, NULL, 0); > I don't recall why an anonymous range set was chosen back at the time > when vPCI was first implemented, but I think this needs to be changed > now that DomU-s get supported. Whether you do so right here or in a > prereq patch is secondary to me. It may be desirable to exclude them > from rangeset_domain_printk() (which would likely require a new > RANGESETF_* flag), but I think such resources should be associated > with their domains. What would be the proper name for such a range set then? "vpci_bar"? > Jan > Thank you, Oleksandr
On 19.11.2021 14:41, Oleksandr Andrushchenko wrote: > > > On 19.11.21 15:16, Jan Beulich wrote: >> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) >>> INIT_LIST_HEAD(&pdev->vpci->handlers); >>> spin_lock_init(&pdev->vpci->lock); >>> >>> + header = &pdev->vpci->header; >>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>> + { >>> + struct vpci_bar *bar = &header->bars[i]; >>> + >>> + bar->mem = rangeset_new(NULL, NULL, 0); >> I don't recall why an anonymous range set was chosen back at the time >> when vPCI was first implemented, but I think this needs to be changed >> now that DomU-s get supported. Whether you do so right here or in a >> prereq patch is secondary to me. It may be desirable to exclude them >> from rangeset_domain_printk() (which would likely require a new >> RANGESETF_* flag), but I think such resources should be associated >> with their domains. > What would be the proper name for such a range set then? > "vpci_bar"? E.g. bb:dd.f:BARn Jan
On 19.11.21 15:57, Jan Beulich wrote: > On 19.11.2021 14:41, Oleksandr Andrushchenko wrote: >> >> On 19.11.21 15:16, Jan Beulich wrote: >>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>> INIT_LIST_HEAD(&pdev->vpci->handlers); >>>> spin_lock_init(&pdev->vpci->lock); >>>> >>>> + header = &pdev->vpci->header; >>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>> + { >>>> + struct vpci_bar *bar = &header->bars[i]; >>>> + >>>> + bar->mem = rangeset_new(NULL, NULL, 0); >>> I don't recall why an anonymous range set was chosen back at the time >>> when vPCI was first implemented, but I think this needs to be changed >>> now that DomU-s get supported. Whether you do so right here or in a >>> prereq patch is secondary to me. It may be desirable to exclude them >>> from rangeset_domain_printk() (which would likely require a new >>> RANGESETF_* flag), but I think such resources should be associated >>> with their domains. >> What would be the proper name for such a range set then? >> "vpci_bar"? > E.g. bb:dd.f:BARn Hm, indeed I can only see a single flag RANGESETF_prettyprint_hex which tells *how* to print, but I can't see any limitation in *what* to print. So, do you mean I want some logic to be implemented in rangeset_domain_printk so it knows that this entry needs to be skipped while printing? RANGESETF_skip_print? > > Jan > Thank you, Oleksandr
On 19.11.2021 15:09, Oleksandr Andrushchenko wrote: > On 19.11.21 15:57, Jan Beulich wrote: >> On 19.11.2021 14:41, Oleksandr Andrushchenko wrote: >>> On 19.11.21 15:16, Jan Beulich wrote: >>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>>> INIT_LIST_HEAD(&pdev->vpci->handlers); >>>>> spin_lock_init(&pdev->vpci->lock); >>>>> >>>>> + header = &pdev->vpci->header; >>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>> + { >>>>> + struct vpci_bar *bar = &header->bars[i]; >>>>> + >>>>> + bar->mem = rangeset_new(NULL, NULL, 0); >>>> I don't recall why an anonymous range set was chosen back at the time >>>> when vPCI was first implemented, but I think this needs to be changed >>>> now that DomU-s get supported. Whether you do so right here or in a >>>> prereq patch is secondary to me. It may be desirable to exclude them >>>> from rangeset_domain_printk() (which would likely require a new >>>> RANGESETF_* flag), but I think such resources should be associated >>>> with their domains. >>> What would be the proper name for such a range set then? >>> "vpci_bar"? >> E.g. bb:dd.f:BARn > Hm, indeed > I can only see a single flag RANGESETF_prettyprint_hex which tells > *how* to print, but I can't see any limitation in *what* to print. > So, do you mean I want some logic to be implemented in > rangeset_domain_printk so it knows that this entry needs to be skipped > while printing? RANGESETF_skip_print? Yes, albeit I'd call the flag e.g. RANGESETF_no_print. Jan
On 22.11.21 10:24, Jan Beulich wrote: > On 19.11.2021 15:09, Oleksandr Andrushchenko wrote: >> On 19.11.21 15:57, Jan Beulich wrote: >>> On 19.11.2021 14:41, Oleksandr Andrushchenko wrote: >>>> On 19.11.21 15:16, Jan Beulich wrote: >>>>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>>>> @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) >>>>>> INIT_LIST_HEAD(&pdev->vpci->handlers); >>>>>> spin_lock_init(&pdev->vpci->lock); >>>>>> >>>>>> + header = &pdev->vpci->header; >>>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>>> + { >>>>>> + struct vpci_bar *bar = &header->bars[i]; >>>>>> + >>>>>> + bar->mem = rangeset_new(NULL, NULL, 0); >>>>> I don't recall why an anonymous range set was chosen back at the time >>>>> when vPCI was first implemented, but I think this needs to be changed >>>>> now that DomU-s get supported. Whether you do so right here or in a >>>>> prereq patch is secondary to me. It may be desirable to exclude them >>>>> from rangeset_domain_printk() (which would likely require a new >>>>> RANGESETF_* flag), but I think such resources should be associated >>>>> with their domains. >>>> What would be the proper name for such a range set then? >>>> "vpci_bar"? >>> E.g. bb:dd.f:BARn >> Hm, indeed >> I can only see a single flag RANGESETF_prettyprint_hex which tells >> *how* to print, but I can't see any limitation in *what* to print. >> So, do you mean I want some logic to be implemented in >> rangeset_domain_printk so it knows that this entry needs to be skipped >> while printing? RANGESETF_skip_print? > Yes, albeit I'd call the flag e.g. RANGESETF_no_print. Then I see two patches here: one which introduces a generic RANGESETF_no_print flag and the second one converting anonymous range set used by vPCI > > Jan > Thank you, Oleksandr
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 1239051ee8ff..5fc2dfbbc864 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -131,34 +131,50 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd, bool vpci_process_pending(struct vcpu *v) { - if ( v->vpci.mem ) + if ( v->vpci.map_pending ) { struct map_data data = { .d = v->domain, .map = v->vpci.cmd & PCI_COMMAND_MEMORY, }; - int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data); + struct pci_dev *pdev = v->vpci.pdev; + struct vpci_header *header = &pdev->vpci->header; + unsigned int i; - if ( rc == -ERESTART ) - return true; + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + { + struct vpci_bar *bar = &header->bars[i]; + int rc; - spin_lock(&v->vpci.pdev->vpci->lock); - /* Disable memory decoding unconditionally on failure. */ - modify_decoding(v->vpci.pdev, - rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd, - !rc && v->vpci.rom_only); - spin_unlock(&v->vpci.pdev->vpci->lock); + if ( rangeset_is_empty(bar->mem) ) + continue; - vpci_cancel_pending(v->vpci.pdev); - if ( rc ) - /* - * FIXME: in case of failure remove the device from the domain. - * Note that there might still be leftover mappings. While this is - * safe for Dom0, for DomUs the domain will likely need to be - * killed in order to avoid leaking stale p2m mappings on - * failure. - */ - vpci_remove_device(v->vpci.pdev); + rc = rangeset_consume_ranges(bar->mem, map_range, &data); + + if ( rc == -ERESTART ) + return true; + + spin_lock(&pdev->vpci->lock); + /* Disable memory decoding unconditionally on failure. */ + modify_decoding(pdev, + rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd, + !rc && v->vpci.rom_only); + spin_unlock(&pdev->vpci->lock); + + if ( rc ) + { + /* + * FIXME: in case of failure remove the device from the domain. + * Note that there might still be leftover mappings. While this is + * safe for Dom0, for DomUs the domain will likely need to be + * killed in order to avoid leaking stale p2m mappings on + * failure. + */ + vpci_remove_device(pdev); + break; + } + } + v->vpci.map_pending = false; } return false; @@ -169,22 +185,48 @@ void vpci_cancel_pending(const struct pci_dev *pdev) struct vcpu *v = current; /* Cancel any pending work now. */ - if ( v->vpci.mem && v->vpci.pdev == pdev) + if ( v->vpci.map_pending && v->vpci.pdev == pdev) { - rangeset_destroy(v->vpci.mem); - v->vpci.mem = NULL; + struct vpci_header *header = &pdev->vpci->header; + unsigned int i; + int rc; + + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + { + struct vpci_bar *bar = &header->bars[i]; + + if ( rangeset_is_empty(bar->mem) ) + continue; + + rc = rangeset_remove_range(bar->mem, 0, ~0ULL); + if ( !rc ) + printk(XENLOG_ERR + "%pd %pp failed to remove range set for BAR: %d\n", + v->domain, &pdev->sbdf, rc); + } + v->vpci.map_pending = false; } } static int __init apply_map(struct domain *d, const struct pci_dev *pdev, - struct rangeset *mem, uint16_t cmd) + uint16_t cmd) { struct map_data data = { .d = d, .map = true }; - int rc; + struct vpci_header *header = &pdev->vpci->header; + int rc = 0; + unsigned int i; + + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + { + struct vpci_bar *bar = &header->bars[i]; + + if ( rangeset_is_empty(bar->mem) ) + continue; - while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART ) - process_pending_softirqs(); - rangeset_destroy(mem); + while ( (rc = rangeset_consume_ranges(bar->mem, map_range, + &data)) == -ERESTART ) + process_pending_softirqs(); + } if ( !rc ) modify_decoding(pdev, cmd, false); @@ -192,7 +234,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev, } static void defer_map(struct domain *d, struct pci_dev *pdev, - struct rangeset *mem, uint16_t cmd, bool rom_only) + uint16_t cmd, bool rom_only) { struct vcpu *curr = current; @@ -203,9 +245,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev, * started for the same device if the domain is not well-behaved. */ curr->vpci.pdev = pdev; - curr->vpci.mem = mem; curr->vpci.cmd = cmd; curr->vpci.rom_only = rom_only; + curr->vpci.map_pending = true; /* * Raise a scheduler softirq in order to prevent the guest from resuming * execution with pending mapping operations, to trigger the invocation @@ -217,42 +259,40 @@ static void defer_map(struct domain *d, struct pci_dev *pdev, static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) { struct vpci_header *header = &pdev->vpci->header; - struct rangeset *mem = rangeset_new(NULL, NULL, 0); struct pci_dev *tmp, *dev = NULL; const struct vpci_msix *msix = pdev->vpci->msix; - unsigned int i; + unsigned int i, j; int rc; - - if ( !mem ) - return -ENOMEM; + bool map_pending; /* - * Create a rangeset that represents the current device BARs memory region + * Create a rangeset per BAR that represents the current device memory region * and compare it against all the currently active BAR memory regions. If * an overlap is found, subtract it from the region to be mapped/unmapped. * - * First fill the rangeset with all the BARs of this device or with the ROM + * First fill the rangesets with all the BARs of this device or with the ROM * BAR only, depending on whether the guest is toggling the memory decode * bit of the command register, or the enable bit of the ROM BAR register. */ for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) { - const struct vpci_bar *bar = &header->bars[i]; + struct vpci_bar *bar = &header->bars[i]; unsigned long start = PFN_DOWN(bar->addr); unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); + ASSERT(bar->mem); + if ( !MAPPABLE_BAR(bar) || (rom_only ? bar->type != VPCI_BAR_ROM : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ) continue; - rc = rangeset_add_range(mem, start, end); + rc = rangeset_add_range(bar->mem, start, end); if ( rc ) { printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n", start, end, rc); - rangeset_destroy(mem); - return rc; + goto fail; } } @@ -263,14 +303,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) + vmsix_table_size(pdev->vpci, i) - 1); - rc = rangeset_remove_range(mem, start, end); - if ( rc ) + for ( j = 0; j < ARRAY_SIZE(header->bars); j++ ) { - printk(XENLOG_G_WARNING - "Failed to remove MSIX table [%lx, %lx]: %d\n", - start, end, rc); - rangeset_destroy(mem); - return rc; + const struct vpci_bar *bar = &header->bars[j]; + + if ( rangeset_is_empty(bar->mem) ) + continue; + + rc = rangeset_remove_range(bar->mem, start, end); + if ( rc ) + { + printk(XENLOG_G_WARNING + "Failed to remove MSIX table [%lx, %lx]: %d\n", + start, end, rc); + goto fail; + } } } @@ -302,7 +349,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) unsigned long start = PFN_DOWN(bar->addr); unsigned long end = PFN_DOWN(bar->addr + bar->size - 1); - if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) || + if ( !bar->enabled || + !rangeset_overlaps_range(bar->mem, start, end) || /* * If only the ROM enable bit is toggled check against other * BARs in the same device for overlaps, but not against the @@ -311,13 +359,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) ) continue; - rc = rangeset_remove_range(mem, start, end); + rc = rangeset_remove_range(bar->mem, start, end); if ( rc ) { printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n", start, end, rc); - rangeset_destroy(mem); - return rc; + goto fail; } } } @@ -335,12 +382,35 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) * will always be to establish mappings and process all the BARs. */ ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only); - return apply_map(pdev->domain, pdev, mem, cmd); + return apply_map(pdev->domain, pdev, cmd); } - defer_map(dev->domain, dev, mem, cmd, rom_only); + /* Find out how many memory ranges has left after MSI and overlaps. */ + map_pending = false; + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + if ( !rangeset_is_empty(header->bars[i].mem) ) + { + map_pending = true; + break; + } + + /* + * There are cases when PCI device, root port for example, has neither + * memory space nor IO. In this case PCI command register write is + * missed resulting in the underlying PCI device not functional, so: + * - if there are no regions write the command register now + * - if there are regions then defer work and write later on + */ + if ( !map_pending ) + pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd); + else + defer_map(dev->domain, dev, cmd, rom_only); return 0; + +fail: + vpci_cancel_pending(pdev); + return rc; } static void cmd_write(const struct pci_dev *pdev, unsigned int reg, diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c index 5f086398a98c..45733300f00b 100644 --- a/xen/drivers/vpci/vpci.c +++ b/xen/drivers/vpci/vpci.c @@ -55,7 +55,12 @@ void vpci_remove_device_handlers(const struct pci_dev *pdev) void vpci_remove_device(struct pci_dev *pdev) { + struct vpci_header *header = &pdev->vpci->header; + unsigned int i; + vpci_cancel_pending(pdev); + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + rangeset_destroy(header->bars[i].mem); vpci_remove_device_handlers(pdev); xfree(pdev->vpci->msix); xfree(pdev->vpci->msi); @@ -80,6 +85,8 @@ static int run_vpci_init(struct pci_dev *pdev) int vpci_add_handlers(struct pci_dev *pdev) { + struct vpci_header *header; + unsigned int i; int rc; if ( !has_vpci(pdev->domain) ) @@ -95,10 +102,25 @@ int vpci_add_handlers(struct pci_dev *pdev) INIT_LIST_HEAD(&pdev->vpci->handlers); spin_lock_init(&pdev->vpci->lock); + header = &pdev->vpci->header; + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) + { + struct vpci_bar *bar = &header->bars[i]; + + bar->mem = rangeset_new(NULL, NULL, 0); + if ( !bar->mem ) + { + rc = -ENOMEM; + goto fail; + } + } + rc = run_vpci_init(pdev); - if ( rc ) - vpci_remove_device(pdev); + if ( !rc ) + return 0; + fail: + vpci_remove_device(pdev); return rc; } diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h index 3e7428da822c..143f3166a730 100644 --- a/xen/include/xen/vpci.h +++ b/xen/include/xen/vpci.h @@ -75,6 +75,7 @@ struct vpci { /* Guest view of the BAR. */ uint64_t guest_addr; uint64_t size; + struct rangeset *mem; enum { VPCI_BAR_EMPTY, VPCI_BAR_IO, @@ -149,9 +150,9 @@ struct vpci { struct vpci_vcpu { /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */ - struct rangeset *mem; struct pci_dev *pdev; uint16_t cmd; + bool map_pending : 1; bool rom_only : 1; };