diff mbox

[1/3] procfs: fdinfo -- Extend information about epoll target files

Message ID 20170310082146.041584651@openvz.org (mailing list archive)
State New, archived
Headers show

Commit Message

Cyrill Gorcunov March 10, 2017, 8:16 a.m. UTC
Since it is possbile to have same number in tfd field (say
file added, closed, then nother file dup'ed to same number
and added back) it is imposible to distinguish such target
files solely by their numbers.

Strictly speaking regular applications don't need to recognize
these targets at all but for checkpoint/restore sake we need
to collect targets to be able to push them back on restore
stage in a proper order.

Thus lets add file position, inode and device number where
this target lays. This three fields can be used as a primary
key for sorting, and together with kcmp help CRIU can find
out an exact file target (from the whole set of processes
being checkpointed).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Andrew Morton <akpm@linuxfoundation.org>
CC: Andrey Vagin <avagin@openvz.org>
CC: Pavel Emelyanov <xemul@virtuozzo.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Kir Kolyshkin <kir@openvz.org>
CC: Jason Baron <jbaron@akamai.com>
CC: Andy Lutomirski <luto@amacapital.net>
---
 Documentation/filesystems/proc.txt |    6 +++++-
 fs/eventpoll.c                     |    8 ++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

Andrey Vagin March 17, 2017, 4:59 a.m. UTC | #1
On Fri, Mar 10, 2017 at 11:16:56AM +0300, Cyrill Gorcunov wrote:
> Since it is possbile to have same number in tfd field (say
> file added, closed, then nother file dup'ed to same number
> and added back) it is imposible to distinguish such target
> files solely by their numbers.
> 
> Strictly speaking regular applications don't need to recognize
> these targets at all but for checkpoint/restore sake we need
> to collect targets to be able to push them back on restore
> stage in a proper order.
> 
> Thus lets add file position, inode and device number where
> this target lays. This three fields can be used as a primary
> key for sorting, and together with kcmp help CRIU can find
> out an exact file target (from the whole set of processes
> being checkpointed).
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> CC: Andrew Morton <akpm@linuxfoundation.org>
> CC: Andrey Vagin <avagin@openvz.org>
> CC: Pavel Emelyanov <xemul@virtuozzo.com>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>
> CC: Kir Kolyshkin <kir@openvz.org>
> CC: Jason Baron <jbaron@akamai.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> ---
>  Documentation/filesystems/proc.txt |    6 +++++-
>  fs/eventpoll.c                     |    8 ++++++--
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> Index: linux-ml.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> +++ linux-ml.git/Documentation/filesystems/proc.txt
> @@ -1779,12 +1779,16 @@ pair provide additional information part
>  	pos:	0
>  	flags:	02
>  	mnt_id:	9
> -	tfd:        5 events:       1d data: ffffffffffffffff
> +	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7

I think it may be better to print mnt_id instead of sdev, because there
may be two file descriptors opened from different bind mounts.

>  
>  	where 'tfd' is a target file descriptor number in decimal form,
>  	'events' is events mask being watched and the 'data' is data
>  	associated with a target [see epoll(7) for more details].
>  
> +	The 'pos' is current offset of the target file in decimal form
> +	[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
> +	where target file resides, all in hex format.
> +
>  	Fsnotify files
>  	~~~~~~~~~~~~~~
>  	For inotify files the format is the following
> Index: linux-ml.git/fs/eventpoll.c
> ===================================================================
> --- linux-ml.git.orig/fs/eventpoll.c
> +++ linux-ml.git/fs/eventpoll.c
> @@ -883,10 +883,14 @@ static void ep_show_fdinfo(struct seq_fi
>  	mutex_lock(&ep->mtx);
>  	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
>  		struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
> +		struct inode *inode = file_inode(epi->ffd.file);
>  
> -		seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
> +		seq_printf(m, "tfd: %8d events: %8x data: %16llx "
> +			   " pos:%lli ino:%lx sdev:%x\n",
>  			   epi->ffd.fd, epi->event.events,
> -			   (long long)epi->event.data);
> +			   (long long)epi->event.data,
> +			   (long long)epi->ffd.file->f_pos,
> +			   inode->i_ino, inode->i_sb->s_dev);
>  		if (seq_has_overflowed(m))
>  			break;
>  	}
>
Cyrill Gorcunov March 17, 2017, 8:26 a.m. UTC | #2
On Thu, Mar 16, 2017 at 09:59:09PM -0700, Andrei Vagin wrote:
> On Fri, Mar 10, 2017 at 11:16:56AM +0300, Cyrill Gorcunov wrote:
> > Since it is possbile to have same number in tfd field (say
> > file added, closed, then nother file dup'ed to same number
> > and added back) it is imposible to distinguish such target
> > files solely by their numbers.
> > 
> > Strictly speaking regular applications don't need to recognize
> > these targets at all but for checkpoint/restore sake we need
> > to collect targets to be able to push them back on restore
> > stage in a proper order.
> > 
> > Thus lets add file position, inode and device number where
> > this target lays. This three fields can be used as a primary
> > key for sorting, and together with kcmp help CRIU can find
> > out an exact file target (from the whole set of processes
> > being checkpointed).
> > 
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> > CC: Al Viro <viro@zeniv.linux.org.uk>
> > CC: Andrew Morton <akpm@linuxfoundation.org>
> > CC: Andrey Vagin <avagin@openvz.org>
> > CC: Pavel Emelyanov <xemul@virtuozzo.com>
> > CC: Michael Kerrisk <mtk.manpages@gmail.com>
> > CC: Kir Kolyshkin <kir@openvz.org>
> > CC: Jason Baron <jbaron@akamai.com>
> > CC: Andy Lutomirski <luto@amacapital.net>
> > ---
> >  Documentation/filesystems/proc.txt |    6 +++++-
> >  fs/eventpoll.c                     |    8 ++++++--
> >  2 files changed, 11 insertions(+), 3 deletions(-)
> > 
> > Index: linux-ml.git/Documentation/filesystems/proc.txt
> > ===================================================================
> > --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> > +++ linux-ml.git/Documentation/filesystems/proc.txt
> > @@ -1779,12 +1779,16 @@ pair provide additional information part
> >  	pos:	0
> >  	flags:	02
> >  	mnt_id:	9
> > -	tfd:        5 events:       1d data: ffffffffffffffff
> > +	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
> 
> I think it may be better to print mnt_id instead of sdev, because there
> may be two file descriptors opened from different bind mounts.

Fetching mnt_id is not that cheap in compare with sdev: instead of
straight dereference inode->i_sb->s_dev we will have to figure out
mnt_id from file+path, and our primary key is from sdev+ino anyway,
so until _really_ needed I prefer cheaper/simplier solution.
diff mbox

Patch

Index: linux-ml.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-ml.git.orig/Documentation/filesystems/proc.txt
+++ linux-ml.git/Documentation/filesystems/proc.txt
@@ -1779,12 +1779,16 @@  pair provide additional information part
 	pos:	0
 	flags:	02
 	mnt_id:	9
-	tfd:        5 events:       1d data: ffffffffffffffff
+	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 	where 'tfd' is a target file descriptor number in decimal form,
 	'events' is events mask being watched and the 'data' is data
 	associated with a target [see epoll(7) for more details].
 
+	The 'pos' is current offset of the target file in decimal form
+	[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
+	where target file resides, all in hex format.
+
 	Fsnotify files
 	~~~~~~~~~~~~~~
 	For inotify files the format is the following
Index: linux-ml.git/fs/eventpoll.c
===================================================================
--- linux-ml.git.orig/fs/eventpoll.c
+++ linux-ml.git/fs/eventpoll.c
@@ -883,10 +883,14 @@  static void ep_show_fdinfo(struct seq_fi
 	mutex_lock(&ep->mtx);
 	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
+		struct inode *inode = file_inode(epi->ffd.file);
 
-		seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
+		seq_printf(m, "tfd: %8d events: %8x data: %16llx "
+			   " pos:%lli ino:%lx sdev:%x\n",
 			   epi->ffd.fd, epi->event.events,
-			   (long long)epi->event.data);
+			   (long long)epi->event.data,
+			   (long long)epi->ffd.file->f_pos,
+			   inode->i_ino, inode->i_sb->s_dev);
 		if (seq_has_overflowed(m))
 			break;
 	}