Message ID | 159408717289.2385045.14094866475168644020.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ACPI/NVDIMM: Runtime Firmware Activation | expand |
Hi! > @@ -234,6 +234,7 @@ static const char * const pm_tests[__TEST_AFTER_LAST] = { > [TEST_PLATFORM] = "platform", > [TEST_DEVICES] = "devices", > [TEST_FREEZER] = "freezer", > + [TEST_MEM_QUIET] = "mem-quiet", > }; > This adds field to sysfs interface, right? I guess we need documentation for that.... Best regards, Pavel
On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote: > The runtime firmware activation capability of Intel NVDIMM devices > requires memory transactions to be disabled for 100s of microseconds. > This timeout is large enough to cause in-flight DMA to fail and other > application detectable timeouts. Arrange for firmware activation to be > executed while the system is "quiesced", all processes and device-DMA > frozen. > > It is already required that invoking device ->freeze() callbacks is > sufficient to cease DMA. A device that continues memory writes outside > of user-direction violates expectations of the PM core to be to > establish a coherent hibernation image. > > That said, RDMA devices are an example of a device that access memory > outside of user process direction. RDMA drivers also typically assume > the system they are operating in will never be hibernated. A solution > for RDMA collisions with firmware activation is outside the scope of > this change and may need to rely on being able to survive the platform > imposed memory controller quiesce period. Thanks for following my suggestion to use the hibernation infrastructure rather than the suspend one, but I think it would be better to go a bit further with that. Namely, after thinking about this a bit more I have come to the conclusion that what is needed is an ability to execute a function, inside of the kernel, in a "quiet" environment in which memory updates are unlikely. While the hibernation infrastructure as is can be used for that, kind of, IMO it would be cleaner to introduce a helper for that, like in the (untested) patch below, so if the "quiet execution environment" is needed, whoever needs it may simply pass a function to hibernate_quiet_exec() and provide whatever user-space I/F is suitable on top of that. Please let me know what you think. Cheers! --- include/linux/suspend.h | 6 ++ kernel/power/hibernate.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+) Index: linux-pm/kernel/power/hibernate.c =================================================================== --- linux-pm.orig/kernel/power/hibernate.c +++ linux-pm/kernel/power/hibernate.c @@ -795,6 +795,103 @@ int hibernate(void) return error; } +/** + * hibernate_quiet_exec - Execute a function with all devices frozen. + * @func: Function to execute. + * @data: Data pointer to pass to @func. + * + * Return the @func return value or an error code if it cannot be executed. + */ +int hibernate_quiet_exec(int (*func)(void *data), void *data) +{ + int error, nr_calls = 0; + + lock_system_sleep(); + + if (!hibernate_acquire()) { + error = -EBUSY; + goto unlock; + } + + pm_prepare_console(); + + error = __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls); + if (error) { + nr_calls--; + goto exit; + } + + error = freeze_processes(); + if (error) + goto exit; + + lock_device_hotplug(); + + pm_suspend_clear_flags(); + + error = platform_begin(true); + if (error) + goto thaw; + + error = freeze_kernel_threads(); + if (error) + goto thaw; + + error = dpm_prepare(PMSG_FREEZE); + if (error) + goto dpm_complete; + + suspend_console(); + + error = dpm_suspend(PMSG_FREEZE); + if (error) + goto dpm_resume; + + error = dpm_suspend_end(PMSG_FREEZE); + if (error) + goto dpm_resume; + + error = platform_pre_snapshot(true); + if (error) + goto skip; + + error = func(data); + +skip: + platform_finish(true); + + dpm_resume_start(PMSG_THAW); + +dpm_resume: + dpm_resume(PMSG_THAW); + + resume_console(); + +dpm_complete: + dpm_complete(PMSG_THAW); + + thaw_kernel_threads(); + +thaw: + platform_end(true); + + unlock_device_hotplug(); + + thaw_processes(); + +exit: + __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL); + + pm_restore_console(); + + hibernate_release(); + +unlock: + unlock_system_sleep(); + + return error; +} +EXPORT_SYMBOL_GPL(hibernate_quiet_exec); /** * software_resume - Resume from a saved hibernation image. Index: linux-pm/include/linux/suspend.h =================================================================== --- linux-pm.orig/include/linux/suspend.h +++ linux-pm/include/linux/suspend.h @@ -453,6 +453,8 @@ extern bool hibernation_available(void); asmlinkage int swsusp_save(void); extern struct pbe *restore_pblist; int pfn_is_nosave(unsigned long pfn); + +int hibernate_quiet_exec(int (*func)(void *data), void *data); #else /* CONFIG_HIBERNATION */ static inline void register_nosave_region(unsigned long b, unsigned long e) {} static inline void register_nosave_region_late(unsigned long b, unsigned long e) {} @@ -464,6 +466,10 @@ static inline void hibernation_set_ops(c static inline int hibernate(void) { return -ENOSYS; } static inline bool system_entering_hibernation(void) { return false; } static inline bool hibernation_available(void) { return false; } + +static inline hibernate_quiet_exec(int (*func)(void *data), void *data) { + return -ENOTSUPP; +} #endif /* CONFIG_HIBERNATION */ #ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV
On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > The runtime firmware activation capability of Intel NVDIMM devices > requires memory transactions to be disabled for 100s of microseconds. > This timeout is large enough to cause in-flight DMA to fail and other > application detectable timeouts. Arrange for firmware activation to be > executed while the system is "quiesced", all processes and device-DMA > frozen. > > It is already required that invoking device ->freeze() callbacks is > sufficient to cease DMA. A device that continues memory writes outside > of user-direction violates expectations of the PM core to be to > establish a coherent hibernation image. > > That said, RDMA devices are an example of a device that access memory > outside of user process direction. RDMA drivers also typically assume > the system they are operating in will never be hibernated. A solution > for RDMA collisions with firmware activation is outside the scope of > this change and may need to rely on being able to survive the platform > imposed memory controller quiesce period. Yikes. I don't think we should support such a broken runtime firmware activation.
On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote: > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > > The runtime firmware activation capability of Intel NVDIMM devices > > requires memory transactions to be disabled for 100s of microseconds. > > This timeout is large enough to cause in-flight DMA to fail and other > > application detectable timeouts. Arrange for firmware activation to be > > executed while the system is "quiesced", all processes and device-DMA > > frozen. > > > > It is already required that invoking device ->freeze() callbacks is > > sufficient to cease DMA. A device that continues memory writes outside > > of user-direction violates expectations of the PM core to be to > > establish a coherent hibernation image. > > > > That said, RDMA devices are an example of a device that access memory > > outside of user process direction. Are you saying freeze doesn't work for some RDMA drivers? That would be a driver bug, I think. The consequences of doing freeze are pretty serious, but it should still stop DMA. Jason
On Thu, Jul 9, 2020 at 5:39 PM Jason Gunthorpe <jgg@mellanox.com> wrote: > > On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote: > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > > > The runtime firmware activation capability of Intel NVDIMM devices > > > requires memory transactions to be disabled for 100s of microseconds. > > > This timeout is large enough to cause in-flight DMA to fail and other > > > application detectable timeouts. Arrange for firmware activation to be > > > executed while the system is "quiesced", all processes and device-DMA > > > frozen. > > > > > > It is already required that invoking device ->freeze() callbacks is > > > sufficient to cease DMA. A device that continues memory writes outside > > > of user-direction violates expectations of the PM core to be to > > > establish a coherent hibernation image. > > > > > > That said, RDMA devices are an example of a device that access memory > > > outside of user process direction. > > Are you saying freeze doesn't work for some RDMA drivers? That would > be a driver bug, I think. > > The consequences of doing freeze are pretty serious, but it should > still stop DMA. Yes, it should. The "freeze" callbacks are expected to prevent any DMA transfers from being carried out after they have been executed. Thanks!
On Thu, Jul 9, 2020 at 8:01 AM Christoph Hellwig <hch@infradead.org> wrote: > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > > The runtime firmware activation capability of Intel NVDIMM devices > > requires memory transactions to be disabled for 100s of microseconds. > > This timeout is large enough to cause in-flight DMA to fail and other > > application detectable timeouts. Arrange for firmware activation to be > > executed while the system is "quiesced", all processes and device-DMA > > frozen. > > > > It is already required that invoking device ->freeze() callbacks is > > sufficient to cease DMA. A device that continues memory writes outside > > of user-direction violates expectations of the PM core to be to > > establish a coherent hibernation image. > > > > That said, RDMA devices are an example of a device that access memory > > outside of user process direction. RDMA drivers also typically assume > > the system they are operating in will never be hibernated. A solution > > for RDMA collisions with firmware activation is outside the scope of > > this change and may need to rely on being able to survive the platform > > imposed memory controller quiesce period. > > Yikes. I don't think we should support such a broken runtime firmware > activation. Yikes indeed, that matches my initial reaction. So I can say that the platform folks recognize that this situation is untenable and you can see in the interface that it is built to support future platforms that can activate without a memory quiesce period. The question is what to do in the meantime. It turns out that despite my initial skeptical reaction a significant number of users are willing to manage this quiesce (or even race this quiesce!) to avoid a server reboot which is basically guaranteed to knock a "9" off of a "5 nines" uptime system. My minimum requirement for supporting this in Linux was to at least have a safe way to mitigate the risks of a race and that's what led me to the hibernate path.
On Thu, Jul 9, 2020 at 8:39 AM Jason Gunthorpe <jgg@mellanox.com> wrote: > > On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote: > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > > > The runtime firmware activation capability of Intel NVDIMM devices > > > requires memory transactions to be disabled for 100s of microseconds. > > > This timeout is large enough to cause in-flight DMA to fail and other > > > application detectable timeouts. Arrange for firmware activation to be > > > executed while the system is "quiesced", all processes and device-DMA > > > frozen. > > > > > > It is already required that invoking device ->freeze() callbacks is > > > sufficient to cease DMA. A device that continues memory writes outside > > > of user-direction violates expectations of the PM core to be to > > > establish a coherent hibernation image. > > > > > > That said, RDMA devices are an example of a device that access memory > > > outside of user process direction. > > Are you saying freeze doesn't work for some RDMA drivers? That would > be a driver bug, I think. Right, it's more my hunch than a known bug at this point, but in my experience with testing server class hardware when I've reported a power management bugs I've sometimes got the incredulous response "who suspends / hibernates servers!?". I can drop that comment. Are there protocol timeouts that might need to be adjusted for a 100s of microseconds blip in memory controller response? > The consequences of doing freeze are pretty serious, but it should > still stop DMA. Ok, and there is still the option to race the quiesce if the effects of the freeze are worse than a potential timeout from the quiesce.
On Thu, Jul 09, 2020 at 09:10:06AM -0700, Dan Williams wrote: > On Thu, Jul 9, 2020 at 8:39 AM Jason Gunthorpe <jgg@mellanox.com> wrote: > > > > On Thu, Jul 09, 2020 at 04:00:51PM +0100, Christoph Hellwig wrote: > > > On Mon, Jul 06, 2020 at 06:59:32PM -0700, Dan Williams wrote: > > > > The runtime firmware activation capability of Intel NVDIMM devices > > > > requires memory transactions to be disabled for 100s of microseconds. > > > > This timeout is large enough to cause in-flight DMA to fail and other > > > > application detectable timeouts. Arrange for firmware activation to be > > > > executed while the system is "quiesced", all processes and device-DMA > > > > frozen. > > > > > > > > It is already required that invoking device ->freeze() callbacks is > > > > sufficient to cease DMA. A device that continues memory writes outside > > > > of user-direction violates expectations of the PM core to be to > > > > establish a coherent hibernation image. > > > > > > > > That said, RDMA devices are an example of a device that access memory > > > > outside of user process direction. > > > > Are you saying freeze doesn't work for some RDMA drivers? That would > > be a driver bug, I think. > > Right, it's more my hunch than a known bug at this point, but in my > experience with testing server class hardware when I've reported a > power management bugs I've sometimes got the incredulous response "who > suspends / hibernates servers!?". I can drop that comment. > > Are there protocol timeouts that might need to be adjusted for a 100s > of microseconds blip in memory controller response? Survivability depends alot on HW support, it has to suspend, not discard DMAs that it needs to issue. Most likely things are as you say, and HW doesn't support safe short time suspend. The usual use of PM stuff here is to make the machine ready for kexec Jason
On Thu, Jul 9, 2020 at 7:57 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote: > > The runtime firmware activation capability of Intel NVDIMM devices > > requires memory transactions to be disabled for 100s of microseconds. > > This timeout is large enough to cause in-flight DMA to fail and other > > application detectable timeouts. Arrange for firmware activation to be > > executed while the system is "quiesced", all processes and device-DMA > > frozen. > > > > It is already required that invoking device ->freeze() callbacks is > > sufficient to cease DMA. A device that continues memory writes outside > > of user-direction violates expectations of the PM core to be to > > establish a coherent hibernation image. > > > > That said, RDMA devices are an example of a device that access memory > > outside of user process direction. RDMA drivers also typically assume > > the system they are operating in will never be hibernated. A solution > > for RDMA collisions with firmware activation is outside the scope of > > this change and may need to rely on being able to survive the platform > > imposed memory controller quiesce period. > > Thanks for following my suggestion to use the hibernation infrastructure > rather than the suspend one, but I think it would be better to go a bit > further with that. > > Namely, after thinking about this a bit more I have come to the conclusion > that what is needed is an ability to execute a function, inside of the > kernel, in a "quiet" environment in which memory updates are unlikely. > > While the hibernation infrastructure as is can be used for that, kind of, IMO > it would be cleaner to introduce a helper for that, like in the (untested) > patch below, so if the "quiet execution environment" is needed, whoever > needs it may simply pass a function to hibernate_quiet_exec() and provide > whatever user-space I/F is suitable on top of that. > > Please let me know what you think. This looks good to me in concept. Would you expect that I trigger this from libnvdimm sysfs, or any future users of this functionality to trigger it through their own subsystem specific mechanisms? I have a place for it in libvdimm and could specify the activation method directly as "suspend" vs "live" activation.
On Thursday, July 9, 2020 9:04:30 PM CEST Dan Williams wrote: > On Thu, Jul 9, 2020 at 7:57 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > On Tuesday, July 7, 2020 3:59:32 AM CEST Dan Williams wrote: > > > The runtime firmware activation capability of Intel NVDIMM devices > > > requires memory transactions to be disabled for 100s of microseconds. > > > This timeout is large enough to cause in-flight DMA to fail and other > > > application detectable timeouts. Arrange for firmware activation to be > > > executed while the system is "quiesced", all processes and device-DMA > > > frozen. > > > > > > It is already required that invoking device ->freeze() callbacks is > > > sufficient to cease DMA. A device that continues memory writes outside > > > of user-direction violates expectations of the PM core to be to > > > establish a coherent hibernation image. > > > > > > That said, RDMA devices are an example of a device that access memory > > > outside of user process direction. RDMA drivers also typically assume > > > the system they are operating in will never be hibernated. A solution > > > for RDMA collisions with firmware activation is outside the scope of > > > this change and may need to rely on being able to survive the platform > > > imposed memory controller quiesce period. > > > > Thanks for following my suggestion to use the hibernation infrastructure > > rather than the suspend one, but I think it would be better to go a bit > > further with that. > > > > Namely, after thinking about this a bit more I have come to the conclusion > > that what is needed is an ability to execute a function, inside of the > > kernel, in a "quiet" environment in which memory updates are unlikely. > > > > While the hibernation infrastructure as is can be used for that, kind of, IMO > > it would be cleaner to introduce a helper for that, like in the (untested) > > patch below, so if the "quiet execution environment" is needed, whoever > > needs it may simply pass a function to hibernate_quiet_exec() and provide > > whatever user-space I/F is suitable on top of that. > > > > Please let me know what you think. > > This looks good to me in concept. > > Would you expect that I trigger this from libnvdimm sysfs, or any > future users of this functionality to trigger it through their own > subsystem specific mechanisms? Yes, I would. > I have a place for it in libvdimm and could specify the activation > method directly as "suspend" vs "live" activation. Sounds good to me. Cheers!
diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c index 0d346a307140..58c26281667c 100644 --- a/drivers/base/syscore.c +++ b/drivers/base/syscore.c @@ -108,6 +108,27 @@ void syscore_resume(void) trace_suspend_resume(TPS("syscore_resume"), 0, false); } EXPORT_SYMBOL_GPL(syscore_resume); + +/** + * syscore_mem_quiet - Execute callbacks that need memory to be quiet (as + * if prepared to be snapshotted for hibernate) + * + * This function is executed in the hibernate path after device freeze + * callbacks. + */ +void syscore_mem_quiet(void) +{ + struct syscore_ops *ops; + + list_for_each_entry(ops, &syscore_ops_list, node) { + if (!ops->mem_quiet) + continue; + if (initcall_debug) + pr_info("PM: Calling %pS\n", ops->mem_quiet); + ops->mem_quiet(); + } +} + #endif /* CONFIG_PM_SLEEP */ /** diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 955265656b96..228b31f85c89 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -3,6 +3,7 @@ * Copyright(c) 2013-2015 Intel Corporation. All rights reserved. */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/syscore_ops.h> #include <linux/libnvdimm.h> #include <linux/sched/mm.h> #include <linux/vmalloc.h> @@ -1289,6 +1290,33 @@ static const struct file_operations nvdimm_fops = { .llseek = noop_llseek, }; +static void trigger_fw_activate(struct nvdimm_bus_descriptor *nd_desc) +{ + if (!nd_desc->fw_ops) + return; + nd_desc->fw_ops->activate(nd_desc); +} + +static void nvdimm_syscore_mem_quiet(void) +{ + struct nvdimm_bus *nvdimm_bus; + + mutex_lock(&nvdimm_bus_list_mutex); + list_for_each_entry(nvdimm_bus, &nvdimm_bus_list, list) { + struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc; + struct device *dev = &nvdimm_bus->dev; + + nvdimm_bus_lock(dev); + trigger_fw_activate(nd_desc); + nvdimm_bus_unlock(dev); + } + mutex_unlock(&nvdimm_bus_list_mutex); +} + +static struct syscore_ops nvdimm_syscore_ops = { + .mem_quiet = nvdimm_syscore_mem_quiet, +}; + int __init nvdimm_bus_init(void) { int rc; @@ -1317,6 +1345,8 @@ int __init nvdimm_bus_init(void) if (rc) goto err_nd_bus; + register_syscore_ops(&nvdimm_syscore_ops); + return 0; err_nd_bus: diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c index b1cc7b35bd51..0cbb5620cd45 100644 --- a/drivers/nvdimm/core.c +++ b/drivers/nvdimm/core.c @@ -417,6 +417,9 @@ static ssize_t firmware_activate_store(struct device *dev, if (!nd_desc->fw_ops) return -EOPNOTSUPP; + if (!sysfs_streq(buf, "activate")) + return -EINVAL; + nvdimm_bus_lock(dev); state = nd_desc->fw_ops->activate_state(nd_desc); diff --git a/include/linux/syscore_ops.h b/include/linux/syscore_ops.h index ae4d48e4c970..b57f7a93de20 100644 --- a/include/linux/syscore_ops.h +++ b/include/linux/syscore_ops.h @@ -15,6 +15,7 @@ struct syscore_ops { int (*suspend)(void); void (*resume)(void); void (*shutdown)(void); + void (*mem_quiet)(void); }; extern void register_syscore_ops(struct syscore_ops *ops); @@ -22,6 +23,7 @@ extern void unregister_syscore_ops(struct syscore_ops *ops); #ifdef CONFIG_PM_SLEEP extern int syscore_suspend(void); extern void syscore_resume(void); +extern void syscore_mem_quiet(void); #endif extern void syscore_shutdown(void); diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 02ec716a4927..ccf2268b9f27 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -118,11 +118,18 @@ EXPORT_SYMBOL(system_entering_hibernation); #ifdef CONFIG_PM_DEBUG static void hibernation_debug_sleep(void) { + /* + * The mem-quiet test state wants to get to syscore_mem_quiet() + * as quickly as possible, skip all debug timeouts by default. + */ + if (pm_test_level == TEST_MEM_QUIET) + return; + pr_info("debug: Waiting for 5 seconds.\n"); mdelay(5000); } -static int hibernation_test(int level) +int hibernation_test(int level) { if (pm_test_level == level) { hibernation_debug_sleep(); @@ -130,9 +137,7 @@ static int hibernation_test(int level) } return 0; } -#else /* !CONFIG_PM_DEBUG */ -static int hibernation_test(int level) { return 0; } -#endif /* !CONFIG_PM_DEBUG */ +#endif /* CONFIG_PM_DEBUG */ /** * platform_begin - Call platform to start hibernation. @@ -400,11 +405,13 @@ int hibernation_snapshot(int platform_mode) error = dpm_suspend(PMSG_FREEZE); - if (error || hibernation_test(TEST_DEVICES)) + if (error || hibernation_test(TEST_DEVICES) || hibernation_test(TEST_MEM_QUIET)) platform_recover(platform_mode); else error = create_image(platform_mode); + syscore_mem_quiet(); + /* * In the case that we call create_image() above, the control * returns here (1) after the image has been created or the diff --git a/kernel/power/main.c b/kernel/power/main.c index 40f86ec4ab30..e1cebd0694b3 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -234,6 +234,7 @@ static const char * const pm_tests[__TEST_AFTER_LAST] = { [TEST_PLATFORM] = "platform", [TEST_DEVICES] = "devices", [TEST_FREEZER] = "freezer", + [TEST_MEM_QUIET] = "mem-quiet", }; static ssize_t pm_test_show(struct kobject *kobj, struct kobj_attribute *attr, diff --git a/kernel/power/power.h b/kernel/power/power.h index ba2094db6294..de7dd36865a5 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -233,6 +233,7 @@ enum { TEST_PLATFORM, TEST_DEVICES, TEST_FREEZER, + TEST_MEM_QUIET, /* keep last */ __TEST_AFTER_LAST }; @@ -246,6 +247,12 @@ extern int pm_test_level; #define pm_test_level (TEST_NONE) #endif +#ifdef CONFIG_PM_DEBUG +int hibernation_test(int level); +#else /* !CONFIG_PM_DEBUG */ +static inline int hibernation_test(int level) { return 0; } +#endif /* !CONFIG_PM_DEBUG */ + #ifdef CONFIG_SUSPEND_FREEZER static inline int suspend_freeze_processes(void) { diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 881128b9351e..82fbc8150340 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1080,6 +1080,13 @@ int create_basic_memory_bitmaps(void) struct memory_bitmap *bm1, *bm2; int error = 0; + /* + * No need to prep bitmaps in the case when the hibernate is + * just being used trigger a memory quiesce point. + */ + if (hibernation_test(TEST_MEM_QUIET)) + return 0; + if (forbidden_pages_map && free_pages_map) return 0; else @@ -1129,6 +1136,9 @@ void free_basic_memory_bitmaps(void) { struct memory_bitmap *bm1, *bm2; + if (hibernation_test(TEST_MEM_QUIET)) + return; + if (WARN_ON(!(forbidden_pages_map && free_pages_map))) return; @@ -1702,6 +1712,9 @@ int hibernate_preallocate_memory(void) ktime_t start, stop; int error; + if (hibernation_test(TEST_MEM_QUIET)) + return 0; + pr_info("Preallocating image memory\n"); start = ktime_get(); diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 8b1bb5ee7e5d..ffaef2938ece 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -323,7 +323,17 @@ MODULE_PARM_DESC(pm_test_delay, static int suspend_test(int level) { #ifdef CONFIG_PM_DEBUG - if (pm_test_level == level) { + /* + * The mem-quiet state has special meaning for the hibernate + * path for the suspend path just treat it the same as + * TEST_DEVICES + */ + int test_level = pm_test_level; + + if (test_level == TEST_MEM_QUIET) + test_level = TEST_DEVICES; + + if (test_level == level) { pr_info("suspend debug: Waiting for %d second(s).\n", pm_test_delay); mdelay(pm_test_delay * 1000);
The runtime firmware activation capability of Intel NVDIMM devices requires memory transactions to be disabled for 100s of microseconds. This timeout is large enough to cause in-flight DMA to fail and other application detectable timeouts. Arrange for firmware activation to be executed while the system is "quiesced", all processes and device-DMA frozen. It is already required that invoking device ->freeze() callbacks is sufficient to cease DMA. A device that continues memory writes outside of user-direction violates expectations of the PM core to be to establish a coherent hibernation image. That said, RDMA devices are an example of a device that access memory outside of user process direction. RDMA drivers also typically assume the system they are operating in will never be hibernated. A solution for RDMA collisions with firmware activation is outside the scope of this change and may need to rely on being able to survive the platform imposed memory controller quiesce period. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [rafael: move from suspend to hibernate post-freeze callback] Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/base/syscore.c | 21 +++++++++++++++++++++ drivers/nvdimm/bus.c | 30 ++++++++++++++++++++++++++++++ drivers/nvdimm/core.c | 3 +++ include/linux/syscore_ops.h | 2 ++ kernel/power/hibernate.c | 17 ++++++++++++----- kernel/power/main.c | 1 + kernel/power/power.h | 7 +++++++ kernel/power/snapshot.c | 13 +++++++++++++ kernel/power/suspend.c | 12 +++++++++++- 9 files changed, 100 insertions(+), 6 deletions(-)