Message ID | 20230711174050.603820-1-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm/memory_hotplug: document the signal_pending() check in offline_pages() | expand |
On Tue 11-07-23 19:40:50, David Hildenbrand wrote: > Let's update the documentation that any signal is sufficient, and > add a comment that not only checking for fatal signals is historical > baggage: changing it now could break existing user space. although > unlikely. > > For example, when an app provides a custom SIGALRM handler and triggers > memory offlining, the timeout cmd would no longer stop memory offlining, > because SIGALRM would no longer be considered a fatal signal. Yes, and it is likely goot to mention here that this is an antipattern for many other kernel operations like IO (e.g. write) but it is a long term behavior that somebody might depend on and it is safer to reflect the documentation to the realitity rather than other way around (which would be imho better). > Cc: Michal Hocko <mhocko@suse.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> > --- > Documentation/admin-guide/mm/memory-hotplug.rst | 2 +- > mm/memory_hotplug.c | 5 +++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst > index 1b02fe5807cc..bd77841041af 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE > (-> BUG), memory offlining will keep retrying until it eventually succeeds. > > When offlining is triggered from user space, the offlining context can be > -terminated by sending a fatal signal. A timeout based offlining can easily be > +terminated by sending a signal. A timeout based offlining can easily be > implemented via:: > > % timeout $TIMEOUT offline_block | failure_handling > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3f231cf1b410..7cfd13c91568 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > do { > pfn = start_pfn; > do { > + /* > + * Historically we always checked for any signal and > + * can't limit it to fatal signals without eventually > + * breaking user space. > + */ > if (signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff"; > -- > 2.41.0
On 7/11/23 23:10, David Hildenbrand wrote: > Let's update the documentation that any signal is sufficient, and > add a comment that not only checking for fatal signals is historical > baggage: changing it now could break existing user space. although > unlikely. > > For example, when an app provides a custom SIGALRM handler and triggers > memory offlining, the timeout cmd would no longer stop memory offlining, > because SIGALRM would no longer be considered a fatal signal. > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > Documentation/admin-guide/mm/memory-hotplug.rst | 2 +- > mm/memory_hotplug.c | 5 +++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst > index 1b02fe5807cc..bd77841041af 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE > (-> BUG), memory offlining will keep retrying until it eventually succeeds. > > When offlining is triggered from user space, the offlining context can be > -terminated by sending a fatal signal. A timeout based offlining can easily be > +terminated by sending a signal. A timeout based offlining can easily be > implemented via:: > > % timeout $TIMEOUT offline_block | failure_handling > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3f231cf1b410..7cfd13c91568 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > do { > pfn = start_pfn; > do { > + /* > + * Historically we always checked for any signal and > + * can't limit it to fatal signals without eventually > + * breaking user space.> + */ Just curious, could 'signal type' to stop memory offline process be considered an ABI and cannot be changed in kernel ever if required ? Just wondering if an additional '!fatal_signal_pending()' check be introduced to warn about support being deprecated, before finally replacing it with fatal_signal_pending(). > if (signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff";
On 11.07.23 22:47, Michal Hocko wrote: > On Tue 11-07-23 19:40:50, David Hildenbrand wrote: >> Let's update the documentation that any signal is sufficient, and >> add a comment that not only checking for fatal signals is historical >> baggage: changing it now could break existing user space. although >> unlikely. >> >> For example, when an app provides a custom SIGALRM handler and triggers >> memory offlining, the timeout cmd would no longer stop memory offlining, >> because SIGALRM would no longer be considered a fatal signal. > > Yes, and it is likely goot to mention here that this is an antipattern > for many other kernel operations like IO (e.g. write) but it is a long > term behavior that somebody might depend on and it is safer to reflect > the documentation to the realitity rather than other way around (which > would be imho better). > You mean adding something like "Note that using signal_pending() instead of fatal_signal_pending() is an anti-pattern, but slowly deprecating that behavior to eventually change it in the far future is probably not worth the effort. If this ever becomes relevant for user-space, we might want to rethink." Thanks!
On 12.07.23 08:47, Anshuman Khandual wrote: > > > On 7/11/23 23:10, David Hildenbrand wrote: >> Let's update the documentation that any signal is sufficient, and >> add a comment that not only checking for fatal signals is historical >> baggage: changing it now could break existing user space. although >> unlikely. >> >> For example, when an app provides a custom SIGALRM handler and triggers >> memory offlining, the timeout cmd would no longer stop memory offlining, >> because SIGALRM would no longer be considered a fatal signal. >> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Jonathan Corbet <corbet@lwn.net> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> Documentation/admin-guide/mm/memory-hotplug.rst | 2 +- >> mm/memory_hotplug.c | 5 +++++ >> 2 files changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst >> index 1b02fe5807cc..bd77841041af 100644 >> --- a/Documentation/admin-guide/mm/memory-hotplug.rst >> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst >> @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE >> (-> BUG), memory offlining will keep retrying until it eventually succeeds. >> >> When offlining is triggered from user space, the offlining context can be >> -terminated by sending a fatal signal. A timeout based offlining can easily be >> +terminated by sending a signal. A timeout based offlining can easily be >> implemented via:: >> >> % timeout $TIMEOUT offline_block | failure_handling >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 3f231cf1b410..7cfd13c91568 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, >> do { >> pfn = start_pfn; >> do { >> + /* >> + * Historically we always checked for any signal and >> + * can't limit it to fatal signals without eventually >> + * breaking user space.> + */ > > Just curious, could 'signal type' to stop memory offline process be considered > an ABI and cannot be changed in kernel ever if required ? Just wondering if an > additional '!fatal_signal_pending()' check be introduced to warn about support > being deprecated, before finally replacing it with fatal_signal_pending(). See my reply to Michal, while that would be doable it is probably not worth the effort, and we'd still have to stick with the existing handling for quite a while. Thanks!
On Wed 12-07-23 21:09:25, David Hildenbrand wrote: > On 11.07.23 22:47, Michal Hocko wrote: > > On Tue 11-07-23 19:40:50, David Hildenbrand wrote: > > > Let's update the documentation that any signal is sufficient, and > > > add a comment that not only checking for fatal signals is historical > > > baggage: changing it now could break existing user space. although > > > unlikely. > > > > > > For example, when an app provides a custom SIGALRM handler and triggers > > > memory offlining, the timeout cmd would no longer stop memory offlining, > > > because SIGALRM would no longer be considered a fatal signal. > > > > Yes, and it is likely goot to mention here that this is an antipattern > > for many other kernel operations like IO (e.g. write) but it is a long > > term behavior that somebody might depend on and it is safer to reflect > > the documentation to the realitity rather than other way around (which > > would be imho better). > > > > You mean adding something like > > "Note that using signal_pending() instead of fatal_signal_pending() is an > anti-pattern, but slowly deprecating that behavior to eventually change it > in the far future is probably not worth the effort. If this ever becomes > relevant for user-space, we might want to rethink." Yes, something like that. Thanks!
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 1b02fe5807cc..bd77841041af 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -669,7 +669,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds. When offlining is triggered from user space, the offlining context can be -terminated by sending a fatal signal. A timeout based offlining can easily be +terminated by sending a signal. A timeout based offlining can easily be implemented via:: % timeout $TIMEOUT offline_block | failure_handling diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3f231cf1b410..7cfd13c91568 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1843,6 +1843,11 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, do { pfn = start_pfn; do { + /* + * Historically we always checked for any signal and + * can't limit it to fatal signals without eventually + * breaking user space. + */ if (signal_pending(current)) { ret = -EINTR; reason = "signal backoff";
Let's update the documentation that any signal is sufficient, and add a comment that not only checking for fatal signals is historical baggage: changing it now could break existing user space. although unlikely. For example, when an app provides a custom SIGALRM handler and triggers memory offlining, the timeout cmd would no longer stop memory offlining, because SIGALRM would no longer be considered a fatal signal. Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- Documentation/admin-guide/mm/memory-hotplug.rst | 2 +- mm/memory_hotplug.c | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-)