Message ID | 20241004181754.8960-1-nathan.fontenot@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] cxl: Update Soft Reserved resources upon region creation | expand |
On Fri, 4 Oct 2024 13:17:54 -0500 Nathan Fontenot <nathan.fontenot@amd.com> wrote: > Update handling of SOFT RESERVE iomem resources that intersect with > CXL region resources to remove the intersections from the SOFT RESERVE > resources. The current approach of leaving the SOFT RESERVE > resource as is can cause failures during hotplug replace of CXL > devices because the resource is not available for reuse after > teardown. Good to give an example here + any debug error messages etc. > > The approach sought is to trim out any pieces of SOFT RESERVE > resources that intersect with CXL regions. To do this, first set > aside any SOFT RESERVE resources that intersect with a CFMWS > into a separate resource tree during e820__reserve_resources_late() > that would have been otherwise added to the iomem resource tree. > > As CXL regions are created the cxl resource created for the new > region is used to trim intersections from the SOFT RESERVE > resources that were previously set aside. > > The next steps are to add any SOFT RESERVE resources remaining to the > iomem resource tree after CXL device probe completes and to notify > the dax driver so it may consume the added SOFT RESERVE resources. > > This patch includes the use of a delayed work queue to wait > for CXL device probe completion and then have a worker thread add > any remaining SOFT RESERVE resources to the iomem resource tree. > > Not in this patch is notification of the dax driver so it may consume > the SOFT RESERVE resources. > > The goal of presenting this RFC is to drive discussion of the > current approach for trimming SOFT RESERVE resources, the use of > a delayed work queue to add remaining SOFT RESERVE resources to > the iomem resource tree, and methods for notifying the dax driver > of any newly added SOFT RESERVE resources. > > NOTE: As this is a RFC the temporary pr_err("CXL DEBUG...") messages > have been left in to aid in testing and validation. > > Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com> > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Trivial, but why Alison's SoB? She didn't send the patch which would be normal reason for last SoB in a list like this (handling of the patch). Maybe a Co-developed-by is missing? Whilst I appreciate this is an RFC, the arch specific bleed into CXL code needs to go. + we need some info here on whether this is an x86 specific issue. I'm fairly sure it's not and that should be called out as future work. Also, much of this is actually generic, with only a few bits being e820 related. Can we pull the soft reserve list etc out of them into generic code. To me the general approach seems reasonable but I'd suggest +CC linux-mm as the issue maybe more about the impacts on DAX than this working for CXL. Comments inline are mostly general suggestions of simplifications and refactoring to aid readability. > --- > arch/x86/include/asm/e820/api.h | 3 + > arch/x86/kernel/e820.c | 156 +++++++++++++++++++++++++++++++- > drivers/cxl/core/region.c | 14 ++- > drivers/cxl/port.c | 34 +++++++ > 4 files changed, 203 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h > index 2e74a7f0e935..542a52871beb 100644 > --- a/arch/x86/include/asm/e820/api.h > +++ b/arch/x86/include/asm/e820/api.h > @@ -44,6 +44,9 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); > > extern int e820__get_entry_type(u64 start, u64 end); > > +extern void e820__trim_soft_reserves(const struct resource *cxl_res); > +extern void e820__insert_soft_reserves(void); > + > /* > * Returns true iff the specified range [start,end) is completely contained inside > * the ISA region. > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index 4893d30ce438..855c26460bb9 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > +static void e820__trim_soft_reserve(struct resource *res, > + const struct resource *cxl_res) > +{ > + resource_size_t new_start, new_end; > + int rc; > + > + pr_err("CXL DEBUG Trimming Soft Reserves for %pr\n", cxl_res); > + > + if (res->start == cxl_res->start && res->end == cxl_res->end) { > + pr_err("CXL DEBUG Releasing resource %pr\n", res); > + release_resource(res); > + kfree(res); > + } else if (res->start == cxl_res->start || res->end == cxl_res->end) { > + if (res->start == cxl_res->start) { Not a huge amount of sharing in here. } else if (res->start = cxl_res->start) { /* Left over bit at top end */ rc = adjust_resource(res, cxl_res->end + 1, res->end - cxl_res->end); if (rc)... } else if (res->end == cxl_res->end) { /* Left over bit at bottom end */ rc = adjust_resource(res, res->start, cxl_res->start - res->start); if (rc) ... } else { /* Left over bits on either side. */ } > + new_start = cxl_res->end + 1; > + new_end = res->end; > + } else { > + new_start = res->start; > + new_end = cxl_res->start - 1; > + } > + > + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", > + res, new_start, new_end); > + rc = adjust_resource(res, new_start, new_end - new_start + 1); > + if (rc) > + pr_debug("Cannot adjust soft reserved resource %pr\n", > + res); > + } else { > + new_start = res->start; > + new_end = res->end; Confusing use of parameters here as they don't match use in early legs. Maybe just putting the values inline would be clearer. > + > + /*Adjust existing to beginning resource */ > + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", res, > + new_start, cxl_res->start); > + adjust_resource(res, new_start, cxl_res->start - new_start + 1); Off by 1? It's different form above leftover bottom part (should be the same). > + > + /* Add new resource for end piece */ > + e820__add_soft_reserve(cxl_res->end + 1, new_end - cxl_res->end, > + res->flags); > + } > +} > + > +void e820__trim_soft_reserves(const struct resource *cxl_res) > +{ > + struct resource *res, *next; > + > + pr_err("CXL DEBUG Trimming Soft Reserves\n"); > + for (res = e820_sr_res.child; res; res = next) { for (res = e820_sr_res.child; res; res = res->sibling) is the same I think and avoids need for next. > + next = res->sibling; > + > + if (resource_contains(res, cxl_res)) { > + e820__trim_soft_reserve(res, cxl_res); > + break; return; > + } > + } > +} > +EXPORT_SYMBOL_GPL(e820__trim_soft_reserves); > + > +static int __init e820_parse_cfmws(union acpi_subtable_headers *hdr, void *arg, > + const unsigned long unused) > +{ > + struct acpi_cedt_cfmws *cfmws; > + struct resource *res = arg; > + struct resource cfmws_res; > + > + /* Validation check, remove when finished debugging */ > + if (!res->parent && res->end) > + pr_err("CXL DEBUG Should insert %pr\n", res); > + > + if (res->parent || !res->end) > + return 0; > + > + cfmws = (struct acpi_cedt_cfmws *)hdr; > + cfmws_res = DEFINE_RES_MEM(cfmws->base_hpa, > + cfmws->base_hpa + cfmws->window_size); > + pr_err("CXL DEBUG Found CFMWS: %pr\n", &cfmws_res); > + > + if (resource_overlaps(&cfmws_res, res)) { > + pr_err("CXL DEBUG Found SOFT RESERVE intersection %llx - %llx : %llx - %llx\n", > + res->start, res->end, cfmws_res.start, cfmws_res.end); > + e820__add_soft_reserve(res->start, resource_size(res), > + res->flags); > + return 1; > + } > + > + return 0; > +} > + > void __init e820__reserve_resources_late(void) > { > - int i; > + int i, rc; > struct resource *res; > > + /* > + * Prior to inserting SOFT_RESERVED resources we want to check for an > + * intersection with potential CXL resources. Any SOFT_RESERVED resources > + * that do intersect a potential CXL resource are set aside so they > + * can be trimmed to accommodate CXL resource intersections and added to > + * the iomem resource tree after the CXL drivers have completed their > + * device probe. > + */ > + pr_err("CXL DEBUG Checking e820 iomem resources\n"); > + > res = e820_res; > for (i = 0; i < e820_table->nr_entries; i++) { > - if (!res->parent && res->end) > - insert_resource_expand_to_fit(&iomem_resource, res); > + pr_err("CXL DEBUG Checking e820 iomem resource %llx - %llx\n", > + res->start, res->end); > + if (res->desc == IORES_DESC_SOFT_RESERVED) { > + rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, > + e820_parse_cfmws, res); Can we make this somewhat generic by having a parse to find a match then call e820__add_soft_reserve(res->start, resource_size(res), res->flags); if the match occurs. That way the check can be in arch independent code. cxl_res_overlaps_cfmws() perhaps. > + if (rc) { > + res++; Move the res++ into the for loop ... i++, res++) > + continue; > + } > + } > + pr_err("CXL DEBUG Inserting %llx - %llx\n", res->start, res->end); > + insert_resource_expand_to_fit(&iomem_resource, res); > res++; > } > > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c > index d7d5d982ce69..9f94730c488f 100644 > --- a/drivers/cxl/port.c > +++ b/drivers/cxl/port.c > @@ -7,6 +7,10 @@ > #include "cxlmem.h" > #include "cxlpci.h" > > +#if CONFIG_X86 > +#include <asm/e820/api.h> > +#endif > + > /** > * DOC: cxl port > * > @@ -89,6 +93,35 @@ static int cxl_switch_port_probe(struct cxl_port *port) > return -ENXIO; > } > > +DECLARE_RWSEM(cxl_sr_rwsem); > + > +static void cxl_sr_update(struct work_struct *w) > +{ > + down_write(&cxl_sr_rwsem); > + pr_err("CXL DEBUG Updating soft reserves\n"); > + e820__insert_soft_reserves(); That's arch specific code bleeding over to the cxl code. So needs at minimum to be hidden behind an arch call. > + up_write(&cxl_sr_rwsem); > +} > + > +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update); > + > +static void schedule_soft_reserve_update(void) > +{ > + static bool update_scheduled; > + int timeout = 5 * HZ; > + > + down_write(&cxl_sr_rwsem); For the locks use guard() However, what are they for? > + if (update_scheduled) { > + pr_err("CXL DEBUG modifying delayed work timeout\n"); > + mod_delayed_work(system_wq, &cxl_sr_work, timeout); See docs for mod_delayed_work_on() "If @dwork is idle, equivalent to queue_delayed_work_on(); " which is where schedule_delayed_work() ends up. So other than maybe for the debug prints you don't need this dance just always call mod_delayed_work() > + } else { > + pr_err("CXL DEBUG Adding delayed work\n"); > + schedule_delayed_work(&cxl_sr_work, timeout); > + update_scheduled = true; > + } > + up_write(&cxl_sr_rwsem); > +} > + > static int cxl_endpoint_port_probe(struct cxl_port *port) > { > struct cxl_endpoint_dvsec_info info = { .port = port }; > @@ -140,6 +173,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) > */ > device_for_each_child(&port->dev, root, discover_region); > > + schedule_soft_reserve_update(); > return 0; > } >
Hi Jonathan, On 10/16/2024 10:43 AM, Jonathan Cameron wrote: > On Fri, 4 Oct 2024 13:17:54 -0500 > Nathan Fontenot <nathan.fontenot@amd.com> wrote: > >> Update handling of SOFT RESERVE iomem resources that intersect with >> CXL region resources to remove the intersections from the SOFT RESERVE >> resources. The current approach of leaving the SOFT RESERVE >> resource as is can cause failures during hotplug replace of CXL >> devices because the resource is not available for reuse after >> teardown. > Good to give an example here + any debug error messages etc. Good point. I'll include a better explanation of the failure with examples in the next version. > >> >> The approach sought is to trim out any pieces of SOFT RESERVE >> resources that intersect with CXL regions. To do this, first set >> aside any SOFT RESERVE resources that intersect with a CFMWS >> into a separate resource tree during e820__reserve_resources_late() >> that would have been otherwise added to the iomem resource tree. >> >> As CXL regions are created the cxl resource created for the new >> region is used to trim intersections from the SOFT RESERVE >> resources that were previously set aside. >> >> The next steps are to add any SOFT RESERVE resources remaining to the >> iomem resource tree after CXL device probe completes and to notify >> the dax driver so it may consume the added SOFT RESERVE resources. >> >> This patch includes the use of a delayed work queue to wait >> for CXL device probe completion and then have a worker thread add >> any remaining SOFT RESERVE resources to the iomem resource tree. >> >> Not in this patch is notification of the dax driver so it may consume >> the SOFT RESERVE resources. >> >> The goal of presenting this RFC is to drive discussion of the >> current approach for trimming SOFT RESERVE resources, the use of >> a delayed work queue to add remaining SOFT RESERVE resources to >> the iomem resource tree, and methods for notifying the dax driver >> of any newly added SOFT RESERVE resources. >> >> NOTE: As this is a RFC the temporary pr_err("CXL DEBUG...") messages >> have been left in to aid in testing and validation. >> >> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com> >> Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > Trivial, but why Alison's SoB? She didn't send the patch which would > be normal reason for last SoB in a list like this (handling of the patch). > > Maybe a Co-developed-by is missing? Alison helped develop the code so a Co-developed-by should be here. > > Whilst I appreciate this is an RFC, the arch specific bleed > into CXL code needs to go. + we need some info here on whether this > is an x86 specific issue. I'm fairly sure it's not and that should > be called out as future work. As far as I know this is an x86 issue for handling the SOFT RESERVE resources, if other arches are seeing this than we need to update the approach. > > Also, much of this is actually generic, with only a few bits > being e820 related. Can we pull the soft reserve list etc out > of them into generic code. Yes, we could do that. The handling of the soft reserve list doesn't need to live in the e820 code. > > To me the general approach seems reasonable but I'd suggest +CC linux-mm > as the issue maybe more about the impacts on DAX than this working for CXL. ok, I'll add linux-mm. One of the pieces I wanted to get feedback on was the dax component and how we could invoke the dax driver to consume any soft reserves that remain after the cxl driver removes intersecting cxl region resources. > > Comments inline are mostly general suggestions of simplifications > and refactoring to aid readability. > >> --- >> arch/x86/include/asm/e820/api.h | 3 + >> arch/x86/kernel/e820.c | 156 +++++++++++++++++++++++++++++++- >> drivers/cxl/core/region.c | 14 ++- >> drivers/cxl/port.c | 34 +++++++ >> 4 files changed, 203 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h >> index 2e74a7f0e935..542a52871beb 100644 >> --- a/arch/x86/include/asm/e820/api.h >> +++ b/arch/x86/include/asm/e820/api.h >> @@ -44,6 +44,9 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); >> >> extern int e820__get_entry_type(u64 start, u64 end); >> >> +extern void e820__trim_soft_reserves(const struct resource *cxl_res); >> +extern void e820__insert_soft_reserves(void); >> + >> /* >> * Returns true iff the specified range [start,end) is completely contained inside >> * the ISA region. >> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c >> index 4893d30ce438..855c26460bb9 100644 >> --- a/arch/x86/kernel/e820.c >> +++ b/arch/x86/kernel/e820.c > > > >> +static void e820__trim_soft_reserve(struct resource *res, >> + const struct resource *cxl_res) >> +{ >> + resource_size_t new_start, new_end; >> + int rc; >> + >> + pr_err("CXL DEBUG Trimming Soft Reserves for %pr\n", cxl_res); >> + >> + if (res->start == cxl_res->start && res->end == cxl_res->end) { >> + pr_err("CXL DEBUG Releasing resource %pr\n", res); >> + release_resource(res); >> + kfree(res); >> + } else if (res->start == cxl_res->start || res->end == cxl_res->end) { >> + if (res->start == cxl_res->start) { > Not a huge amount of sharing in here. > } else if (res->start = cxl_res->start) { > /* Left over bit at top end */ > rc = adjust_resource(res, cxl_res->end + 1, res->end - cxl_res->end); > if (rc)... > > } else if (res->end == cxl_res->end) { > /* Left over bit at bottom end */ > rc = adjust_resource(res, res->start, cxl_res->start - res->start); > if (rc) ... > } else { > /* Left over bits on either side. */ > } Yep, I'll update that. >> + new_start = cxl_res->end + 1; >> + new_end = res->end; >> + } else { >> + new_start = res->start; >> + new_end = cxl_res->start - 1; >> + } >> + >> + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", >> + res, new_start, new_end); >> + rc = adjust_resource(res, new_start, new_end - new_start + 1); >> + if (rc) >> + pr_debug("Cannot adjust soft reserved resource %pr\n", >> + res); >> + } else { >> + new_start = res->start; >> + new_end = res->end; > Confusing use of parameters here as they don't match use in early legs. > Maybe just putting the values inline would be clearer. will do. >> + >> + /*Adjust existing to beginning resource */ >> + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", res, >> + new_start, cxl_res->start); >> + adjust_resource(res, new_start, cxl_res->start - new_start + 1); > > Off by 1? It's different form above leftover bottom part (should be the same). I thought I got the math right, maybe I didn't. I'll re-run tests to and check it. > > >> + >> + /* Add new resource for end piece */ >> + e820__add_soft_reserve(cxl_res->end + 1, new_end - cxl_res->end, >> + res->flags); >> + } >> +} >> + >> +void e820__trim_soft_reserves(const struct resource *cxl_res) >> +{ >> + struct resource *res, *next; >> + >> + pr_err("CXL DEBUG Trimming Soft Reserves\n"); >> + for (res = e820_sr_res.child; res; res = next) { > for (res = e820_sr_res.child; res; res = res->sibling) > is the same I think and avoids need for next. Yes, it should be. I'll update it. >> + next = res->sibling; >> + >> + if (resource_contains(res, cxl_res)) { >> + e820__trim_soft_reserve(res, cxl_res); >> + break; > return; > >> + } >> + } >> +} >> +EXPORT_SYMBOL_GPL(e820__trim_soft_reserves); >> + >> +static int __init e820_parse_cfmws(union acpi_subtable_headers *hdr, void *arg, >> + const unsigned long unused) >> +{ >> + struct acpi_cedt_cfmws *cfmws; >> + struct resource *res = arg; >> + struct resource cfmws_res; >> + >> + /* Validation check, remove when finished debugging */ >> + if (!res->parent && res->end) >> + pr_err("CXL DEBUG Should insert %pr\n", res); >> + >> + if (res->parent || !res->end) >> + return 0; >> + >> + cfmws = (struct acpi_cedt_cfmws *)hdr; >> + cfmws_res = DEFINE_RES_MEM(cfmws->base_hpa, >> + cfmws->base_hpa + cfmws->window_size); >> + pr_err("CXL DEBUG Found CFMWS: %pr\n", &cfmws_res); >> + >> + if (resource_overlaps(&cfmws_res, res)) { >> + pr_err("CXL DEBUG Found SOFT RESERVE intersection %llx - %llx : %llx - %llx\n", >> + res->start, res->end, cfmws_res.start, cfmws_res.end); >> + e820__add_soft_reserve(res->start, resource_size(res), >> + res->flags); >> + return 1; >> + } >> + >> + return 0; >> +} >> + >> void __init e820__reserve_resources_late(void) >> { >> - int i; >> + int i, rc; >> struct resource *res; >> >> + /* >> + * Prior to inserting SOFT_RESERVED resources we want to check for an >> + * intersection with potential CXL resources. Any SOFT_RESERVED resources >> + * that do intersect a potential CXL resource are set aside so they >> + * can be trimmed to accommodate CXL resource intersections and added to >> + * the iomem resource tree after the CXL drivers have completed their >> + * device probe. >> + */ >> + pr_err("CXL DEBUG Checking e820 iomem resources\n"); >> + >> res = e820_res; >> for (i = 0; i < e820_table->nr_entries; i++) { >> - if (!res->parent && res->end) >> - insert_resource_expand_to_fit(&iomem_resource, res); >> + pr_err("CXL DEBUG Checking e820 iomem resource %llx - %llx\n", >> + res->start, res->end); >> + if (res->desc == IORES_DESC_SOFT_RESERVED) { >> + rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, >> + e820_parse_cfmws, res); > Can we make this somewhat generic by having a parse to find a match > then call > e820__add_soft_reserve(res->start, resource_size(res), res->flags); > if the match occurs. That way the check can be in arch independent code. > > cxl_res_overlaps_cfmws() perhaps. > I like this better than what I have. I was never real crazy about having the e820 code parsing the cfmws. > >> + if (rc) { >> + res++; > > Move the res++ into the for loop ... i++, res++) will do. > >> + continue; >> + } >> + } >> + pr_err("CXL DEBUG Inserting %llx - %llx\n", res->start, res->end); >> + insert_resource_expand_to_fit(&iomem_resource, res); >> res++; >> } >> > >> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c >> index d7d5d982ce69..9f94730c488f 100644 >> --- a/drivers/cxl/port.c >> +++ b/drivers/cxl/port.c >> @@ -7,6 +7,10 @@ >> #include "cxlmem.h" >> #include "cxlpci.h" >> >> +#if CONFIG_X86 >> +#include <asm/e820/api.h> >> +#endif >> + >> /** >> * DOC: cxl port >> * >> @@ -89,6 +93,35 @@ static int cxl_switch_port_probe(struct cxl_port *port) >> return -ENXIO; >> } >> >> +DECLARE_RWSEM(cxl_sr_rwsem); >> + >> +static void cxl_sr_update(struct work_struct *w) >> +{ >> + down_write(&cxl_sr_rwsem); >> + pr_err("CXL DEBUG Updating soft reserves\n"); >> + e820__insert_soft_reserves(); > > That's arch specific code bleeding over to the cxl code. > So needs at minimum to be hidden behind an arch call. Agreed, keeping #ifdef <ARCH> out of the driver would be good. I'll look at adding an arch call. > >> + up_write(&cxl_sr_rwsem); >> +} >> + >> +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update); >> + >> +static void schedule_soft_reserve_update(void) >> +{ >> + static bool update_scheduled; >> + int timeout = 5 * HZ; >> + >> + down_write(&cxl_sr_rwsem); > For the locks use guard() Yes can switch to guard(). > > However, what are they for? The concern is that we could have races on scheduling the soft reserve updates and I wanted to try to keep only one work item on the queue. They may not be needed, I was erring on the side of safety > >> + if (update_scheduled) { >> + pr_err("CXL DEBUG modifying delayed work timeout\n"); >> + mod_delayed_work(system_wq, &cxl_sr_work, timeout); > > See docs for mod_delayed_work_on() > > "If @dwork is idle, equivalent to queue_delayed_work_on(); " > which is where schedule_delayed_work() ends up. > So other than maybe for the debug prints you don't need this dance > just always call mod_delayed_work() > Thanks, I'll look into it. ...and thanks for all the review comments. -Nathan > > >> + } else { >> + pr_err("CXL DEBUG Adding delayed work\n"); >> + schedule_delayed_work(&cxl_sr_work, timeout); >> + update_scheduled = true; >> + } >> + up_write(&cxl_sr_rwsem); >> +} >> + >> static int cxl_endpoint_port_probe(struct cxl_port *port) >> { >> struct cxl_endpoint_dvsec_info info = { .port = port }; >> @@ -140,6 +173,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) >> */ >> device_for_each_child(&port->dev, root, discover_region); >> >> + schedule_soft_reserve_update(); >> return 0; >> } >> >
On Thu, 17 Oct 2024 09:49:01 -0500 "Fontenot, Nathan" <nafonten@amd.com> wrote: > Hi Jonathan, Hi Nathan, > > On 10/16/2024 10:43 AM, Jonathan Cameron wrote: > > On Fri, 4 Oct 2024 13:17:54 -0500 > > Nathan Fontenot <nathan.fontenot@amd.com> wrote: > > > >> Update handling of SOFT RESERVE iomem resources that intersect with > >> CXL region resources to remove the intersections from the SOFT RESERVE > >> resources. The current approach of leaving the SOFT RESERVE > >> resource as is can cause failures during hotplug replace of CXL > >> devices because the resource is not available for reuse after > >> teardown. > > Good to give an example here + any debug error messages etc. > > Good point. I'll include a better explanation of the failure with examples > in the next version. Thanks > > > > >> > >> The approach sought is to trim out any pieces of SOFT RESERVE > >> resources that intersect with CXL regions. To do this, first set > >> aside any SOFT RESERVE resources that intersect with a CFMWS > >> into a separate resource tree during e820__reserve_resources_late() > >> that would have been otherwise added to the iomem resource tree. > >> > >> As CXL regions are created the cxl resource created for the new > >> region is used to trim intersections from the SOFT RESERVE > >> resources that were previously set aside. > >> > >> The next steps are to add any SOFT RESERVE resources remaining to the > >> iomem resource tree after CXL device probe completes and to notify > >> the dax driver so it may consume the added SOFT RESERVE resources. > >> > >> This patch includes the use of a delayed work queue to wait > >> for CXL device probe completion and then have a worker thread add > >> any remaining SOFT RESERVE resources to the iomem resource tree. > >> > >> Not in this patch is notification of the dax driver so it may consume > >> the SOFT RESERVE resources. > >> > >> The goal of presenting this RFC is to drive discussion of the > >> current approach for trimming SOFT RESERVE resources, the use of > >> a delayed work queue to add remaining SOFT RESERVE resources to > >> the iomem resource tree, and methods for notifying the dax driver > >> of any newly added SOFT RESERVE resources. > >> > >> NOTE: As this is a RFC the temporary pr_err("CXL DEBUG...") messages > >> have been left in to aid in testing and validation. > >> > >> Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com> > >> Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > > > Trivial, but why Alison's SoB? She didn't send the patch which would > > be normal reason for last SoB in a list like this (handling of the patch). > > > > Maybe a Co-developed-by is missing? > > Alison helped develop the code so a Co-developed-by should be here. > > > > > Whilst I appreciate this is an RFC, the arch specific bleed > > into CXL code needs to go. + we need some info here on whether this > > is an x86 specific issue. I'm fairly sure it's not and that should > > be called out as future work. > > As far as I know this is an x86 issue for handling the SOFT RESERVE > resources, if other arches are seeing this than we need to update the approach. Hmm. I've not fully understood where it is tripping up in the x86 code. Any EFI table using arch (arm64 etc) has soft reserved handling so a bios doing CXL enumeration might have chosen to mark the CXL memory with that. The rest of us just don't bounce through the ancient history of e820, It's more than possible that other architectures don't care that it was previously soft reserved on hotplug paths. I'm not sure. I guess I could hack a test but it will be a while... > > > > > Also, much of this is actually generic, with only a few bits > > being e820 related. Can we pull the soft reserve list etc out > > of them into generic code. > > Yes, we could do that. The handling of the soft reserve list doesn't > need to live in the e820 code. > > > > > To me the general approach seems reasonable but I'd suggest +CC linux-mm > > as the issue maybe more about the impacts on DAX than this working for CXL. > > ok, I'll add linux-mm. One of the pieces I wanted to get feedback on was the > dax component and how we could invoke the dax driver to consume any soft > reserves that remain after the cxl driver removes intersecting cxl region > resources. Understood. That bit I can't help with and we all know everyone else will back away faster than Dan Williams :) > > > >> + up_write(&cxl_sr_rwsem); > >> +} > >> + > >> +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update); > >> + > >> +static void schedule_soft_reserve_update(void) > >> +{ > >> + static bool update_scheduled; > >> + int timeout = 5 * HZ; > >> + > >> + down_write(&cxl_sr_rwsem); > > For the locks use guard() > > Yes can switch to guard(). > > > > > However, what are they for? > > The concern is that we could have races on scheduling the soft reserve updates > and I wanted to try to keep only one work item on the queue. They may not be > needed, I was erring on the side of safety You only have one item, so it can't be there twice. If the aim is to protec that one item then maybe this makes sense. Jonathan
Hi guys, On 17/10/2024 17:53, Jonathan Cameron wrote: > On Thu, 17 Oct 2024 09:49:01 -0500 > "Fontenot, Nathan" <nafonten@amd.com> wrote: >> On 10/16/2024 10:43 AM, Jonathan Cameron wrote: >>> On Fri, 4 Oct 2024 13:17:54 -0500 >>> Nathan Fontenot <nathan.fontenot@amd.com> wrote: >>> >>>> Update handling of SOFT RESERVE iomem resources that intersect with >>>> CXL region resources to remove the intersections from the SOFT RESERVE >>>> resources. The current approach of leaving the SOFT RESERVE >>>> resource as is can cause failures during hotplug replace of CXL >>>> devices because the resource is not available for reuse after >>>> teardown. >>>> The approach sought is to trim out any pieces of SOFT RESERVE >>>> resources that intersect with CXL regions. To do this, first set >>>> aside any SOFT RESERVE resources that intersect with a CFMWS >>>> into a separate resource tree during e820__reserve_resources_late() >>>> that would have been otherwise added to the iomem resource tree. >>>> >>>> As CXL regions are created the cxl resource created for the new >>>> region is used to trim intersections from the SOFT RESERVE >>>> resources that were previously set aside. >>>> >>>> The next steps are to add any SOFT RESERVE resources remaining to the >>>> iomem resource tree after CXL device probe completes and to notify >>>> the dax driver so it may consume the added SOFT RESERVE resources. >>>> >>>> This patch includes the use of a delayed work queue to wait >>>> for CXL device probe completion and then have a worker thread add >>>> any remaining SOFT RESERVE resources to the iomem resource tree. >>>> >>>> Not in this patch is notification of the dax driver so it may consume >>>> the SOFT RESERVE resources. >>>> >>>> The goal of presenting this RFC is to drive discussion of the >>>> current approach for trimming SOFT RESERVE resources, the use of >>>> a delayed work queue to add remaining SOFT RESERVE resources to >>>> the iomem resource tree, and methods for notifying the dax driver >>>> of any newly added SOFT RESERVE resources. >>> Whilst I appreciate this is an RFC, the arch specific bleed >>> into CXL code needs to go. + we need some info here on whether this >>> is an x86 specific issue. I'm fairly sure it's not and that should >>> be called out as future work. >> >> As far as I know this is an x86 issue for handling the SOFT RESERVE >> resources, if other arches are seeing this than we need to update the approach. > > Hmm. I've not fully understood where it is tripping up in the x86 code. > Any EFI table using arch (arm64 etc) has soft reserved handling so > a bios doing CXL enumeration might have chosen to mark the CXL memory > with that. > > The rest of us just don't bounce through the ancient history of e820, > > It's more than possible that other architectures don't care that > it was previously soft reserved on hotplug paths. I'm not sure. > I guess I could hack a test but it will be a while... arm64 will try to block removal of memory that was described in the UEFI memory map. The reason is we can't modify the UEFI tables when they have bits we don't understand. (Should we preserve those bits if the memory comes back?) On a UEFI platform, the memory map is the only information about where memory is that we have, we can't fix it up with other things we know like x86 does, (or stir in another memory map we found near the ISA hole) These tables are treated as the static properties of the system that cannot be changed and is just handed over during kexec. (This also means we don't have to work around the lovecraftian matrix of bugs that could be introduced by kexecing via older kernels) To remove memory on arm64, it should have been detected via one of the hotplug mechanisms. I think this removal of early boot memory is going to be specific to x86. Thanks, James
diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h index 2e74a7f0e935..542a52871beb 100644 --- a/arch/x86/include/asm/e820/api.h +++ b/arch/x86/include/asm/e820/api.h @@ -44,6 +44,9 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); extern int e820__get_entry_type(u64 start, u64 end); +extern void e820__trim_soft_reserves(const struct resource *cxl_res); +extern void e820__insert_soft_reserves(void); + /* * Returns true iff the specified range [start,end) is completely contained inside * the ISA region. diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 4893d30ce438..855c26460bb9 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1208,15 +1208,165 @@ static unsigned long __init ram_alignment(resource_size_t pos) #define MAX_RESOURCE_SIZE ((resource_size_t)-1) +static struct resource e820_sr_res = { + .name = "e820 Soft Reserves", + .start = 0, + .end = MAX_RESOURCE_SIZE, +}; + +void e820__insert_soft_reserves(void) +{ + struct resource *res, *next; + int rc; + + pr_err("CXL DEBUG Inserting Soft Reserves\n"); + for (res = e820_sr_res.child; res; res = next) { + next = res->sibling; + + pr_err("CXL DEBUG Inserting Soft Reserve %pr\n", res); + remove_resource(res); + rc = insert_resource(&iomem_resource, res); + if (rc) + pr_debug("CXL DEBUG Cannot insert %pr\n", res); + } +} +EXPORT_SYMBOL_GPL(e820__insert_soft_reserves); + +static void e820__add_soft_reserve(resource_size_t start, resource_size_t len, + unsigned long flags) +{ + struct resource *res; + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + pr_err("CXL DEBUG Couldn't add Soft Reserved %llx (%llx)\n", + start, len); + return; + } + + *res = DEFINE_RES_NAMED(start, len, "Soft Reserved", flags); + res->desc = IORES_DESC_SOFT_RESERVED; + pr_err("CXL DEBUG Inserting new Soft Reserved %pr\n", res); + insert_resource(&e820_sr_res, res); +} + +static void e820__trim_soft_reserve(struct resource *res, + const struct resource *cxl_res) +{ + resource_size_t new_start, new_end; + int rc; + + pr_err("CXL DEBUG Trimming Soft Reserves for %pr\n", cxl_res); + + if (res->start == cxl_res->start && res->end == cxl_res->end) { + pr_err("CXL DEBUG Releasing resource %pr\n", res); + release_resource(res); + kfree(res); + } else if (res->start == cxl_res->start || res->end == cxl_res->end) { + if (res->start == cxl_res->start) { + new_start = cxl_res->end + 1; + new_end = res->end; + } else { + new_start = res->start; + new_end = cxl_res->start - 1; + } + + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", + res, new_start, new_end); + rc = adjust_resource(res, new_start, new_end - new_start + 1); + if (rc) + pr_debug("Cannot adjust soft reserved resource %pr\n", + res); + } else { + new_start = res->start; + new_end = res->end; + + /*Adjust existing to beginning resource */ + pr_err("CXL DEBUG Adjusting resource %pr (%llx - %llx)\n", res, + new_start, cxl_res->start); + adjust_resource(res, new_start, cxl_res->start - new_start + 1); + + /* Add new resource for end piece */ + e820__add_soft_reserve(cxl_res->end + 1, new_end - cxl_res->end, + res->flags); + } +} + +void e820__trim_soft_reserves(const struct resource *cxl_res) +{ + struct resource *res, *next; + + pr_err("CXL DEBUG Trimming Soft Reserves\n"); + for (res = e820_sr_res.child; res; res = next) { + next = res->sibling; + + if (resource_contains(res, cxl_res)) { + e820__trim_soft_reserve(res, cxl_res); + break; + } + } +} +EXPORT_SYMBOL_GPL(e820__trim_soft_reserves); + +static int __init e820_parse_cfmws(union acpi_subtable_headers *hdr, void *arg, + const unsigned long unused) +{ + struct acpi_cedt_cfmws *cfmws; + struct resource *res = arg; + struct resource cfmws_res; + + /* Validation check, remove when finished debugging */ + if (!res->parent && res->end) + pr_err("CXL DEBUG Should insert %pr\n", res); + + if (res->parent || !res->end) + return 0; + + cfmws = (struct acpi_cedt_cfmws *)hdr; + cfmws_res = DEFINE_RES_MEM(cfmws->base_hpa, + cfmws->base_hpa + cfmws->window_size); + pr_err("CXL DEBUG Found CFMWS: %pr\n", &cfmws_res); + + if (resource_overlaps(&cfmws_res, res)) { + pr_err("CXL DEBUG Found SOFT RESERVE intersection %llx - %llx : %llx - %llx\n", + res->start, res->end, cfmws_res.start, cfmws_res.end); + e820__add_soft_reserve(res->start, resource_size(res), + res->flags); + return 1; + } + + return 0; +} + void __init e820__reserve_resources_late(void) { - int i; + int i, rc; struct resource *res; + /* + * Prior to inserting SOFT_RESERVED resources we want to check for an + * intersection with potential CXL resources. Any SOFT_RESERVED resources + * that do intersect a potential CXL resource are set aside so they + * can be trimmed to accommodate CXL resource intersections and added to + * the iomem resource tree after the CXL drivers have completed their + * device probe. + */ + pr_err("CXL DEBUG Checking e820 iomem resources\n"); + res = e820_res; for (i = 0; i < e820_table->nr_entries; i++) { - if (!res->parent && res->end) - insert_resource_expand_to_fit(&iomem_resource, res); + pr_err("CXL DEBUG Checking e820 iomem resource %llx - %llx\n", + res->start, res->end); + if (res->desc == IORES_DESC_SOFT_RESERVED) { + rc = acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, + e820_parse_cfmws, res); + if (rc) { + res++; + continue; + } + } + pr_err("CXL DEBUG Inserting %llx - %llx\n", res->start, res->end); + insert_resource_expand_to_fit(&iomem_resource, res); res++; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 21ad5f242875..539cccfffda0 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -14,6 +14,10 @@ #include <cxl.h> #include "core.h" +#if CONFIG_X86 +#include <asm/e820/api.h> +#endif + /** * DOC: cxl core region * @@ -3226,6 +3230,14 @@ static int match_region_by_range(struct device *dev, void *data) return rc; } +static int insert_region_resource(struct resource *parent, struct resource *res) +{ +#if CONFIG_X86 + e820__trim_soft_reserves(res); +#endif + return insert_resource(parent, res); +} + /* Establish an empty region covering the given HPA range */ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, struct cxl_endpoint_decoder *cxled) @@ -3272,7 +3284,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa), dev_name(&cxlr->dev)); - rc = insert_resource(cxlrd->res, res); + rc = insert_region_resource(cxlrd->res, res); if (rc) { /* * Platform-firmware may not have split resources like "System diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index d7d5d982ce69..9f94730c488f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -7,6 +7,10 @@ #include "cxlmem.h" #include "cxlpci.h" +#if CONFIG_X86 +#include <asm/e820/api.h> +#endif + /** * DOC: cxl port * @@ -89,6 +93,35 @@ static int cxl_switch_port_probe(struct cxl_port *port) return -ENXIO; } +DECLARE_RWSEM(cxl_sr_rwsem); + +static void cxl_sr_update(struct work_struct *w) +{ + down_write(&cxl_sr_rwsem); + pr_err("CXL DEBUG Updating soft reserves\n"); + e820__insert_soft_reserves(); + up_write(&cxl_sr_rwsem); +} + +DECLARE_DELAYED_WORK(cxl_sr_work, cxl_sr_update); + +static void schedule_soft_reserve_update(void) +{ + static bool update_scheduled; + int timeout = 5 * HZ; + + down_write(&cxl_sr_rwsem); + if (update_scheduled) { + pr_err("CXL DEBUG modifying delayed work timeout\n"); + mod_delayed_work(system_wq, &cxl_sr_work, timeout); + } else { + pr_err("CXL DEBUG Adding delayed work\n"); + schedule_delayed_work(&cxl_sr_work, timeout); + update_scheduled = true; + } + up_write(&cxl_sr_rwsem); +} + static int cxl_endpoint_port_probe(struct cxl_port *port) { struct cxl_endpoint_dvsec_info info = { .port = port }; @@ -140,6 +173,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) */ device_for_each_child(&port->dev, root, discover_region); + schedule_soft_reserve_update(); return 0; }