diff mbox series

[v6] fs: Improve eventpoll logging to stop indicting timerfd

Message ID 20240703214315.454407-1-isaacmanjarres@google.com (mailing list archive)
State New
Headers show
Series [v6] fs: Improve eventpoll logging to stop indicting timerfd | expand

Commit Message

Isaac Manjarres July 3, 2024, 9:43 p.m. UTC
From: Manish Varma <varmam@google.com>

We'll often see aborted suspend operations that look like:

 PM: suspend entry 2024-07-03 15:55:15.372419634 UTC
 PM: PM: Pending Wakeup Sources: [timerfd]
 Abort: Pending Wakeup Sources: [timerfd]
 PM: suspend exit 2024-07-03 15:55:15.445281857 UTC

From this, it seems a timerfd caused the abort, but that can be
confusing, as timerfds don't create wakeup sources. However,
eventpoll can, and when it does, it names them after the underlying
file descriptor. Unfortunately, all the file descriptors are called
"[timerfd]", and a system may have many timerfds, so this isn't very
useful to debug what's going on to cause the suspend to abort.

To improve this, change the way eventpoll wakeup sources are named:

1) The top-level per-process eventpoll wakeup source is now named
"epollN:P" (instead of just "eventpoll"), where N is a unique ID token,
and P is the PID of the creating process.

2) Individual eventpoll item wakeup sources are now named
"epollitemN:P.F", where N is a unique ID token, P is PID of the creating
process, and F is the name of the underlying file descriptor.

Now, when the scenario described earlier is encountered, the following
kernel logs are emitted:

 PM: suspend entry 2024-07-03 15:39:24.945791824 UTC
 PM: PM: Pending Wakeup Sources: epollitem30:6375.[timerfd]
 Abort: Pending Wakeup Sources: epollitem30:6375.[timerfd]
 PM: suspend exit 2024-07-03 15:39:25.017775019 UTC

There are various benefits to this new naming convention:

1) It is clear that the wakeup source is linked to an eventpoll
item.

2) Now that the PID of the process associated with that timerfd
instance is known, it is easy to map the PID of the process to the
name of the process. With this information, it is easy to start
debugging which process is causing this issue to occur.

3) Even if process 6375 creates multiple timerfd instances, the
ID token is useful in identifying which timerfd instance associated
with the process is causing suspend to abort, as it is monotonically
increasing. So if the order in which the timerfd instances for the
process is known, then one can pinpoint which timerfd instance is
causing this issue.

Co-developed-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Manish Varma <varmam@google.com>
Co-developed-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
---
 drivers/base/power/wakeup.c | 12 +++++++++---
 fs/eventpoll.c              | 11 +++++++++--
 include/linux/pm_wakeup.h   |  8 ++++----
 3 files changed, 22 insertions(+), 9 deletions(-)

 v1 -> v2:
 - Renamed instance count to wakesource_create_id to better describe
   its purpose.
 - Changed the wakeup source naming convention for wakeup sources
   created by eventpoll to avoid changing the timerfd names.
 - Used the PID of the process instead of the process name for the
   sake of uniqueness when creating wakeup sources.

v2 -> v3:
 - Changed wakeup_source_register() to take in a format string
   and arguments to avoid duplicating code to construct wakeup
   source names.
 - Moved the definition of wakesource_create_id so that it is
   always defined to fix an compilation error.

v3 -> v4:
 - Changed the naming convention for the top-level epoll wakeup
   sources to include an ID for uniqueness. This is needed in
   cases where a process is using two epoll fds.
 - Edited commit log to reflect new changes and add new tags.

v4 -> v5:
 - Added the format attribute to the wakeup_source_register()
   function to address a warning from the kernel test robot:
   https://lore.kernel.org/all/202406050504.UvdlPAQ0-lkp@intel.com/

v5 -> v6:
 - Reworded the commit text to clarify the scenarios in which this
   patch is helpful, as per feedback from
   John Stultz <jstultz@google.com>

Comments

Rafael J. Wysocki July 4, 2024, 12:31 p.m. UTC | #1
On Wed, Jul 3, 2024 at 11:43 PM Isaac J. Manjarres
<isaacmanjarres@google.com> wrote:
>
> From: Manish Varma <varmam@google.com>
>
> We'll often see aborted suspend operations that look like:
>
>  PM: suspend entry 2024-07-03 15:55:15.372419634 UTC
>  PM: PM: Pending Wakeup Sources: [timerfd]
>  Abort: Pending Wakeup Sources: [timerfd]
>  PM: suspend exit 2024-07-03 15:55:15.445281857 UTC
>
> From this, it seems a timerfd caused the abort, but that can be
> confusing, as timerfds don't create wakeup sources. However,
> eventpoll can, and when it does, it names them after the underlying
> file descriptor. Unfortunately, all the file descriptors are called
> "[timerfd]", and a system may have many timerfds, so this isn't very
> useful to debug what's going on to cause the suspend to abort.
>
> To improve this, change the way eventpoll wakeup sources are named:
>
> 1) The top-level per-process eventpoll wakeup source is now named
> "epollN:P" (instead of just "eventpoll"), where N is a unique ID token,
> and P is the PID of the creating process.
>
> 2) Individual eventpoll item wakeup sources are now named
> "epollitemN:P.F", where N is a unique ID token, P is PID of the creating
> process, and F is the name of the underlying file descriptor.
>
> Now, when the scenario described earlier is encountered, the following
> kernel logs are emitted:
>
>  PM: suspend entry 2024-07-03 15:39:24.945791824 UTC
>  PM: PM: Pending Wakeup Sources: epollitem30:6375.[timerfd]
>  Abort: Pending Wakeup Sources: epollitem30:6375.[timerfd]
>  PM: suspend exit 2024-07-03 15:39:25.017775019 UTC
>
> There are various benefits to this new naming convention:
>
> 1) It is clear that the wakeup source is linked to an eventpoll
> item.
>
> 2) Now that the PID of the process associated with that timerfd
> instance is known, it is easy to map the PID of the process to the
> name of the process. With this information, it is easy to start
> debugging which process is causing this issue to occur.
>
> 3) Even if process 6375 creates multiple timerfd instances, the
> ID token is useful in identifying which timerfd instance associated
> with the process is causing suspend to abort, as it is monotonically
> increasing. So if the order in which the timerfd instances for the
> process is known, then one can pinpoint which timerfd instance is
> causing this issue.
>
> Co-developed-by: Kelly Rossmoyer <krossmo@google.com>
> Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
> Signed-off-by: Manish Varma <varmam@google.com>
> Co-developed-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> ---
>  drivers/base/power/wakeup.c | 12 +++++++++---

For the changes in wakeup.c

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

>  fs/eventpoll.c              | 11 +++++++++--
>  include/linux/pm_wakeup.h   |  8 ++++----
>  3 files changed, 22 insertions(+), 9 deletions(-)
>
>  v1 -> v2:
>  - Renamed instance count to wakesource_create_id to better describe
>    its purpose.
>  - Changed the wakeup source naming convention for wakeup sources
>    created by eventpoll to avoid changing the timerfd names.
>  - Used the PID of the process instead of the process name for the
>    sake of uniqueness when creating wakeup sources.
>
> v2 -> v3:
>  - Changed wakeup_source_register() to take in a format string
>    and arguments to avoid duplicating code to construct wakeup
>    source names.
>  - Moved the definition of wakesource_create_id so that it is
>    always defined to fix an compilation error.
>
> v3 -> v4:
>  - Changed the naming convention for the top-level epoll wakeup
>    sources to include an ID for uniqueness. This is needed in
>    cases where a process is using two epoll fds.
>  - Edited commit log to reflect new changes and add new tags.
>
> v4 -> v5:
>  - Added the format attribute to the wakeup_source_register()
>    function to address a warning from the kernel test robot:
>    https://lore.kernel.org/all/202406050504.UvdlPAQ0-lkp@intel.com/
>
> v5 -> v6:
>  - Reworded the commit text to clarify the scenarios in which this
>    patch is helpful, as per feedback from
>    John Stultz <jstultz@google.com>
>
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 752b417e8129..04a808607b62 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -209,13 +209,19 @@ EXPORT_SYMBOL_GPL(wakeup_source_remove);
>  /**
>   * wakeup_source_register - Create wakeup source and add it to the list.
>   * @dev: Device this wakeup source is associated with (or NULL if virtual).
> - * @name: Name of the wakeup source to register.
> + * @fmt: format string for the wakeup source name
>   */
> -struct wakeup_source *wakeup_source_register(struct device *dev,
> -                                            const char *name)
> +__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +                                                           const char *fmt, ...)
>  {
>         struct wakeup_source *ws;
>         int ret;
> +       char name[128];
> +       va_list args;
> +
> +       va_start(args, fmt);
> +       vsnprintf(name, sizeof(name), fmt, args);
> +       va_end(args);
>
>         ws = wakeup_source_create(name);
>         if (ws) {
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index f53ca4f7fced..941df15208a4 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -338,6 +338,7 @@ static void __init epoll_sysctls_init(void)
>  #define epoll_sysctls_init() do { } while (0)
>  #endif /* CONFIG_SYSCTL */
>
> +static atomic_t wakesource_create_id  = ATOMIC_INIT(0);
>  static const struct file_operations eventpoll_fops;
>
>  static inline int is_file_epoll(struct file *f)
> @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi)
>  {
>         struct name_snapshot n;
>         struct wakeup_source *ws;
> +       pid_t task_pid;
> +       int id;
> +
> +       task_pid = task_pid_nr(current);
>
>         if (!epi->ep->ws) {
> -               epi->ep->ws = wakeup_source_register(NULL, "eventpoll");
> +               id = atomic_inc_return(&wakesource_create_id);
> +               epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid);
>                 if (!epi->ep->ws)
>                         return -ENOMEM;
>         }
>
> +       id = atomic_inc_return(&wakesource_create_id);
>         take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry);
> -       ws = wakeup_source_register(NULL, n.name.name);
> +       ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name);
>         release_dentry_name_snapshot(&n);
>
>         if (!ws)
> diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
> index 76cd1f9f1365..1fb6dca981c2 100644
> --- a/include/linux/pm_wakeup.h
> +++ b/include/linux/pm_wakeup.h
> @@ -99,8 +99,8 @@ extern struct wakeup_source *wakeup_source_create(const char *name);
>  extern void wakeup_source_destroy(struct wakeup_source *ws);
>  extern void wakeup_source_add(struct wakeup_source *ws);
>  extern void wakeup_source_remove(struct wakeup_source *ws);
> -extern struct wakeup_source *wakeup_source_register(struct device *dev,
> -                                                   const char *name);
> +extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +                                                                  const char *fmt, ...);
>  extern void wakeup_source_unregister(struct wakeup_source *ws);
>  extern int wakeup_sources_read_lock(void);
>  extern void wakeup_sources_read_unlock(int idx);
> @@ -140,8 +140,8 @@ static inline void wakeup_source_add(struct wakeup_source *ws) {}
>
>  static inline void wakeup_source_remove(struct wakeup_source *ws) {}
>
> -static inline struct wakeup_source *wakeup_source_register(struct device *dev,
> -                                                          const char *name)
> +static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +                                                                         const char *fmt, ...)
>  {
>         return NULL;
>  }
> --
> 2.45.2.803.g4e1b14247a-goog
>
Christian Brauner July 4, 2024, 2:03 p.m. UTC | #2
On Wed, Jul 03, 2024 at 02:43:14PM GMT, Isaac J. Manjarres wrote:
> From: Manish Varma <varmam@google.com>
> 
> We'll often see aborted suspend operations that look like:
> 
>  PM: suspend entry 2024-07-03 15:55:15.372419634 UTC
>  PM: PM: Pending Wakeup Sources: [timerfd]
>  Abort: Pending Wakeup Sources: [timerfd]
>  PM: suspend exit 2024-07-03 15:55:15.445281857 UTC
> 
> From this, it seems a timerfd caused the abort, but that can be
> confusing, as timerfds don't create wakeup sources. However,
> eventpoll can, and when it does, it names them after the underlying
> file descriptor. Unfortunately, all the file descriptors are called
> "[timerfd]", and a system may have many timerfds, so this isn't very
> useful to debug what's going on to cause the suspend to abort.
> 
> To improve this, change the way eventpoll wakeup sources are named:
> 
> 1) The top-level per-process eventpoll wakeup source is now named
> "epollN:P" (instead of just "eventpoll"), where N is a unique ID token,
> and P is the PID of the creating process.
> 
> 2) Individual eventpoll item wakeup sources are now named
> "epollitemN:P.F", where N is a unique ID token, P is PID of the creating
> process, and F is the name of the underlying file descriptor.

Fyi, that PID is meaningless or even actively misleading in the face of
pid namespaces. And since such wakeups seem to be registered in sysfs
globally they are visible to all containers. That means a container will
now see some timerfd wakeup source with a PID that might just accidently
correspond to a process inside the container. Which in turn also means
you're leaking the info about the creating process into the container.
IOW, if PID 1 ends up registering some wakeup source the container gets
to know about it.

> 
> Now, when the scenario described earlier is encountered, the following
> kernel logs are emitted:
> 
>  PM: suspend entry 2024-07-03 15:39:24.945791824 UTC
>  PM: PM: Pending Wakeup Sources: epollitem30:6375.[timerfd]
>  Abort: Pending Wakeup Sources: epollitem30:6375.[timerfd]
>  PM: suspend exit 2024-07-03 15:39:25.017775019 UTC
> 
> There are various benefits to this new naming convention:
> 
> 1) It is clear that the wakeup source is linked to an eventpoll
> item.
> 
> 2) Now that the PID of the process associated with that timerfd
> instance is known, it is easy to map the PID of the process to the
> name of the process. With this information, it is easy to start
> debugging which process is causing this issue to occur.
> 
> 3) Even if process 6375 creates multiple timerfd instances, the
> ID token is useful in identifying which timerfd instance associated
> with the process is causing suspend to abort, as it is monotonically
> increasing. So if the order in which the timerfd instances for the
> process is known, then one can pinpoint which timerfd instance is
> causing this issue.
> 
> Co-developed-by: Kelly Rossmoyer <krossmo@google.com>
> Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
> Signed-off-by: Manish Varma <varmam@google.com>
> Co-developed-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> ---
>  drivers/base/power/wakeup.c | 12 +++++++++---
>  fs/eventpoll.c              | 11 +++++++++--
>  include/linux/pm_wakeup.h   |  8 ++++----
>  3 files changed, 22 insertions(+), 9 deletions(-)
> 
>  v1 -> v2:
>  - Renamed instance count to wakesource_create_id to better describe
>    its purpose.
>  - Changed the wakeup source naming convention for wakeup sources
>    created by eventpoll to avoid changing the timerfd names.
>  - Used the PID of the process instead of the process name for the
>    sake of uniqueness when creating wakeup sources.
> 
> v2 -> v3:
>  - Changed wakeup_source_register() to take in a format string
>    and arguments to avoid duplicating code to construct wakeup
>    source names.
>  - Moved the definition of wakesource_create_id so that it is
>    always defined to fix an compilation error.
> 
> v3 -> v4:
>  - Changed the naming convention for the top-level epoll wakeup
>    sources to include an ID for uniqueness. This is needed in
>    cases where a process is using two epoll fds.
>  - Edited commit log to reflect new changes and add new tags.
> 
> v4 -> v5:
>  - Added the format attribute to the wakeup_source_register()
>    function to address a warning from the kernel test robot:
>    https://lore.kernel.org/all/202406050504.UvdlPAQ0-lkp@intel.com/
> 
> v5 -> v6:
>  - Reworded the commit text to clarify the scenarios in which this
>    patch is helpful, as per feedback from
>    John Stultz <jstultz@google.com>
> 
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 752b417e8129..04a808607b62 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -209,13 +209,19 @@ EXPORT_SYMBOL_GPL(wakeup_source_remove);
>  /**
>   * wakeup_source_register - Create wakeup source and add it to the list.
>   * @dev: Device this wakeup source is associated with (or NULL if virtual).
> - * @name: Name of the wakeup source to register.
> + * @fmt: format string for the wakeup source name
>   */
> -struct wakeup_source *wakeup_source_register(struct device *dev,
> -					     const char *name)
> +__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +							    const char *fmt, ...)
>  {
>  	struct wakeup_source *ws;
>  	int ret;
> +	char name[128];
> +	va_list args;
> +
> +	va_start(args, fmt);
> +	vsnprintf(name, sizeof(name), fmt, args);
> +	va_end(args);
>  
>  	ws = wakeup_source_create(name);
>  	if (ws) {
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index f53ca4f7fced..941df15208a4 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -338,6 +338,7 @@ static void __init epoll_sysctls_init(void)
>  #define epoll_sysctls_init() do { } while (0)
>  #endif /* CONFIG_SYSCTL */
>  
> +static atomic_t wakesource_create_id  = ATOMIC_INIT(0);
>  static const struct file_operations eventpoll_fops;
>  
>  static inline int is_file_epoll(struct file *f)
> @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi)
>  {
>  	struct name_snapshot n;
>  	struct wakeup_source *ws;
> +	pid_t task_pid;
> +	int id;
> +
> +	task_pid = task_pid_nr(current);
>  
>  	if (!epi->ep->ws) {
> -		epi->ep->ws = wakeup_source_register(NULL, "eventpoll");
> +		id = atomic_inc_return(&wakesource_create_id);
> +		epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid);
>  		if (!epi->ep->ws)
>  			return -ENOMEM;
>  	}
>  
> +	id = atomic_inc_return(&wakesource_create_id);
>  	take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry);
> -	ws = wakeup_source_register(NULL, n.name.name);
> +	ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name);
>  	release_dentry_name_snapshot(&n);
>  
>  	if (!ws)
> diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
> index 76cd1f9f1365..1fb6dca981c2 100644
> --- a/include/linux/pm_wakeup.h
> +++ b/include/linux/pm_wakeup.h
> @@ -99,8 +99,8 @@ extern struct wakeup_source *wakeup_source_create(const char *name);
>  extern void wakeup_source_destroy(struct wakeup_source *ws);
>  extern void wakeup_source_add(struct wakeup_source *ws);
>  extern void wakeup_source_remove(struct wakeup_source *ws);
> -extern struct wakeup_source *wakeup_source_register(struct device *dev,
> -						    const char *name);
> +extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +								   const char *fmt, ...);
>  extern void wakeup_source_unregister(struct wakeup_source *ws);
>  extern int wakeup_sources_read_lock(void);
>  extern void wakeup_sources_read_unlock(int idx);
> @@ -140,8 +140,8 @@ static inline void wakeup_source_add(struct wakeup_source *ws) {}
>  
>  static inline void wakeup_source_remove(struct wakeup_source *ws) {}
>  
> -static inline struct wakeup_source *wakeup_source_register(struct device *dev,
> -							   const char *name)
> +static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
> +									  const char *fmt, ...)
>  {
>  	return NULL;
>  }
> -- 
> 2.45.2.803.g4e1b14247a-goog
>
Isaac Manjarres July 9, 2024, 9:04 p.m. UTC | #3
On Thu, Jul 04, 2024 at 04:03:59PM +0200, Christian Brauner wrote:
> On Wed, Jul 03, 2024 at 02:43:14PM GMT, Isaac J. Manjarres wrote:
> > From: Manish Varma <varmam@google.com>
> > 
> > We'll often see aborted suspend operations that look like:
> > 
> >  PM: suspend entry 2024-07-03 15:55:15.372419634 UTC
> >  PM: PM: Pending Wakeup Sources: [timerfd]
> >  Abort: Pending Wakeup Sources: [timerfd]
> >  PM: suspend exit 2024-07-03 15:55:15.445281857 UTC
> > 
> > From this, it seems a timerfd caused the abort, but that can be
> > confusing, as timerfds don't create wakeup sources. However,
> > eventpoll can, and when it does, it names them after the underlying
> > file descriptor. Unfortunately, all the file descriptors are called
> > "[timerfd]", and a system may have many timerfds, so this isn't very
> > useful to debug what's going on to cause the suspend to abort.
> > 
> > To improve this, change the way eventpoll wakeup sources are named:
> > 
> > 1) The top-level per-process eventpoll wakeup source is now named
> > "epollN:P" (instead of just "eventpoll"), where N is a unique ID token,
> > and P is the PID of the creating process.
> > 
> > 2) Individual eventpoll item wakeup sources are now named
> > "epollitemN:P.F", where N is a unique ID token, P is PID of the creating
> > process, and F is the name of the underlying file descriptor.
> 
> Fyi, that PID is meaningless or even actively misleading in the face of
> pid namespaces. And since such wakeups seem to be registered in sysfs
> globally they are visible to all containers. That means a container will
> now see some timerfd wakeup source with a PID that might just accidently
> correspond to a process inside the container. Which in turn also means
Thanks for your feedback on this, Christian. With regards to this
scenario: would it be useful to use a namespace ID, along with the PID,
to uniquely identify the process? If not, do you have a suggestion for
this?

I understand that the proposed naming scheme has a chance of causing
collisions, however, it is still an improvement over the existing
naming scheme in terms of being able to attribute wakeups to a
particular application.

> you're leaking the info about the creating process into the container.
> IOW, if PID 1 ends up registering some wakeup source the container gets
> to know about it.
Is there a general security concern about this? If not, can you please
elaborate why this is a problem?

Thanks,
Isaac
Isaac Manjarres July 29, 2024, 6:35 p.m. UTC | #4
On Tue, Jul 09, 2024 at 02:04:43PM -0700, Isaac Manjarres wrote:
> On Thu, Jul 04, 2024 at 04:03:59PM +0200, Christian Brauner wrote:
> > On Wed, Jul 03, 2024 at 02:43:14PM GMT, Isaac J. Manjarres wrote:
> > > From: Manish Varma <varmam@google.com>
> > > 
> > > We'll often see aborted suspend operations that look like:
> > > 
> > >  PM: suspend entry 2024-07-03 15:55:15.372419634 UTC
> > >  PM: PM: Pending Wakeup Sources: [timerfd]
> > >  Abort: Pending Wakeup Sources: [timerfd]
> > >  PM: suspend exit 2024-07-03 15:55:15.445281857 UTC
> > > 
> > > From this, it seems a timerfd caused the abort, but that can be
> > > confusing, as timerfds don't create wakeup sources. However,
> > > eventpoll can, and when it does, it names them after the underlying
> > > file descriptor. Unfortunately, all the file descriptors are called
> > > "[timerfd]", and a system may have many timerfds, so this isn't very
> > > useful to debug what's going on to cause the suspend to abort.
> > > 
> > > To improve this, change the way eventpoll wakeup sources are named:
> > > 
> > > 1) The top-level per-process eventpoll wakeup source is now named
> > > "epollN:P" (instead of just "eventpoll"), where N is a unique ID token,
> > > and P is the PID of the creating process.
> > > 
> > > 2) Individual eventpoll item wakeup sources are now named
> > > "epollitemN:P.F", where N is a unique ID token, P is PID of the creating
> > > process, and F is the name of the underlying file descriptor.
> > 
> > Fyi, that PID is meaningless or even actively misleading in the face of
> > pid namespaces. And since such wakeups seem to be registered in sysfs
> > globally they are visible to all containers. That means a container will
> > now see some timerfd wakeup source with a PID that might just accidently
> > correspond to a process inside the container. Which in turn also means
> Thanks for your feedback on this, Christian. With regards to this
> scenario: would it be useful to use a namespace ID, along with the PID,
> to uniquely identify the process? If not, do you have a suggestion for
> this?
> 
> I understand that the proposed naming scheme has a chance of causing
> collisions, however, it is still an improvement over the existing
> naming scheme in terms of being able to attribute wakeups to a
> particular application.
> 
> > you're leaking the info about the creating process into the container.
> > IOW, if PID 1 ends up registering some wakeup source the container gets
> > to know about it.
> Is there a general security concern about this? If not, can you please
> elaborate why this is a problem?
> 
Hey Christian,

I just wanted to follow-up to see if you had a chance to go through my
questions above?

Thanks,
Isaac
diff mbox series

Patch

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 752b417e8129..04a808607b62 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -209,13 +209,19 @@  EXPORT_SYMBOL_GPL(wakeup_source_remove);
 /**
  * wakeup_source_register - Create wakeup source and add it to the list.
  * @dev: Device this wakeup source is associated with (or NULL if virtual).
- * @name: Name of the wakeup source to register.
+ * @fmt: format string for the wakeup source name
  */
-struct wakeup_source *wakeup_source_register(struct device *dev,
-					     const char *name)
+__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
+							    const char *fmt, ...)
 {
 	struct wakeup_source *ws;
 	int ret;
+	char name[128];
+	va_list args;
+
+	va_start(args, fmt);
+	vsnprintf(name, sizeof(name), fmt, args);
+	va_end(args);
 
 	ws = wakeup_source_create(name);
 	if (ws) {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f53ca4f7fced..941df15208a4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -338,6 +338,7 @@  static void __init epoll_sysctls_init(void)
 #define epoll_sysctls_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static atomic_t wakesource_create_id  = ATOMIC_INIT(0);
 static const struct file_operations eventpoll_fops;
 
 static inline int is_file_epoll(struct file *f)
@@ -1545,15 +1546,21 @@  static int ep_create_wakeup_source(struct epitem *epi)
 {
 	struct name_snapshot n;
 	struct wakeup_source *ws;
+	pid_t task_pid;
+	int id;
+
+	task_pid = task_pid_nr(current);
 
 	if (!epi->ep->ws) {
-		epi->ep->ws = wakeup_source_register(NULL, "eventpoll");
+		id = atomic_inc_return(&wakesource_create_id);
+		epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid);
 		if (!epi->ep->ws)
 			return -ENOMEM;
 	}
 
+	id = atomic_inc_return(&wakesource_create_id);
 	take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry);
-	ws = wakeup_source_register(NULL, n.name.name);
+	ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name);
 	release_dentry_name_snapshot(&n);
 
 	if (!ws)
diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
index 76cd1f9f1365..1fb6dca981c2 100644
--- a/include/linux/pm_wakeup.h
+++ b/include/linux/pm_wakeup.h
@@ -99,8 +99,8 @@  extern struct wakeup_source *wakeup_source_create(const char *name);
 extern void wakeup_source_destroy(struct wakeup_source *ws);
 extern void wakeup_source_add(struct wakeup_source *ws);
 extern void wakeup_source_remove(struct wakeup_source *ws);
-extern struct wakeup_source *wakeup_source_register(struct device *dev,
-						    const char *name);
+extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
+								   const char *fmt, ...);
 extern void wakeup_source_unregister(struct wakeup_source *ws);
 extern int wakeup_sources_read_lock(void);
 extern void wakeup_sources_read_unlock(int idx);
@@ -140,8 +140,8 @@  static inline void wakeup_source_add(struct wakeup_source *ws) {}
 
 static inline void wakeup_source_remove(struct wakeup_source *ws) {}
 
-static inline struct wakeup_source *wakeup_source_register(struct device *dev,
-							   const char *name)
+static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev,
+									  const char *fmt, ...)
 {
 	return NULL;
 }