Message ID | 20170221171254.954209904@openvz.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Feb 21, 2017 at 8:59 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote: > Since it is possbile to have same number in tfd field (say > file added, closed, then nother file dup'ed to same number > and added back) it is imposible to distinguish such target > files solely by their numbers. > > Strictly speaking regular applications don't need to recognize > these targets at all but for checkpoint/restore sake we need > to collect targets to be able to push them back on restore > in proper order. > > Thus lets add file position, inode and device number where > this target lays. This three fields can be used as a primary > key for sorting, and together with kcmp help CRIU can find > out an exact file target (from the whole set of processes > being checkpointed). I have no problem with this, but I'm wondering whether kcmp's ordered comparisons could also be used for this purpose.
On Tue, Feb 21, 2017 at 10:41:12AM -0800, Andy Lutomirski wrote: > > Thus lets add file position, inode and device number where > > this target lays. This three fields can be used as a primary > > key for sorting, and together with kcmp help CRIU can find > > out an exact file target (from the whole set of processes > > being checkpointed). > > I have no problem with this, but I'm wondering whether kcmp's ordered > comparisons could also be used for this purpose. Yes it can, but it would increas number of kcmp calls signisicantly. Look, here is how we build files tree in criu: we take inode^sdev^pos as a primary key and remember it inside rbtree while we're dumping files (note also that we don't keep files opened but rather dump them in chunks). Then once we find that two files have same primary key we use kcmp to build subtree. This really helps a lot. And I plan to do the same with target files from epolls: - they gonna be handled after all opened files of all processes in container (or children processes if dumping single task), thus the complete tree with primary key already will be built - on every target file I calculate primary key and then using kcmp will find if file is exactly one matching
On 02/21/2017 10:16 PM, Cyrill Gorcunov wrote: > On Tue, Feb 21, 2017 at 10:41:12AM -0800, Andy Lutomirski wrote: >>> Thus lets add file position, inode and device number where >>> this target lays. This three fields can be used as a primary >>> key for sorting, and together with kcmp help CRIU can find >>> out an exact file target (from the whole set of processes >>> being checkpointed). >> >> I have no problem with this, but I'm wondering whether kcmp's ordered >> comparisons could also be used for this purpose. > > Yes it can, but it would increas number of kcmp calls signisicantly. Actually it shouldn't. If you extend the kcmp argument to accept the epollfd:epollslot pair, this would be effectively the same as if you had all your epoll-ed files injected into your fdtable with "strange" fd numbers. We already have two-level rbtree for this in criu, adding extended ("strange") fd to it should be OK. > Look, here is how we build files tree in criu: we take inode^sdev^pos > as a primary key and remember it inside rbtree while we're dumping files > (note also that we don't keep files opened but rather dump them in > chunks). Then once we find that two files have same primary key > we use kcmp to build subtree. This really helps a lot. And I plan > to do the same with target files from epolls: > > - they gonna be handled after all opened files of all processes > in container (or children processes if dumping single task), > thus the complete tree with primary key already will be built > > - on every target file I calculate primary key and then using > kcmp will find if file is exactly one matching > . >
On Wed, Feb 22, 2017 at 10:44:07AM +0300, Pavel Emelyanov wrote: > On 02/21/2017 10:16 PM, Cyrill Gorcunov wrote: > > On Tue, Feb 21, 2017 at 10:41:12AM -0800, Andy Lutomirski wrote: > >>> Thus lets add file position, inode and device number where > >>> this target lays. This three fields can be used as a primary > >>> key for sorting, and together with kcmp help CRIU can find > >>> out an exact file target (from the whole set of processes > >>> being checkpointed). > >> > >> I have no problem with this, but I'm wondering whether kcmp's ordered > >> comparisons could also be used for this purpose. > > > > Yes it can, but it would increas number of kcmp calls signisicantly. > > Actually it shouldn't. If you extend the kcmp argument to accept the > epollfd:epollslot pair, this would be effectively the same as if you > had all your epoll-ed files injected into your fdtable with "strange" > fd numbers. We already have two-level rbtree for this in criu, adding > extended ("strange") fd to it should be OK. Nope. Pavel, I guess you forget how we handle file tree in criu currently. We call for kcmp only if we have to -- when primary key for two entries is the same.
On 02/22/2017 10:54 AM, Cyrill Gorcunov wrote: > On Wed, Feb 22, 2017 at 10:44:07AM +0300, Pavel Emelyanov wrote: >> On 02/21/2017 10:16 PM, Cyrill Gorcunov wrote: >>> On Tue, Feb 21, 2017 at 10:41:12AM -0800, Andy Lutomirski wrote: >>>>> Thus lets add file position, inode and device number where >>>>> this target lays. This three fields can be used as a primary >>>>> key for sorting, and together with kcmp help CRIU can find >>>>> out an exact file target (from the whole set of processes >>>>> being checkpointed). >>>> >>>> I have no problem with this, but I'm wondering whether kcmp's ordered >>>> comparisons could also be used for this purpose. >>> >>> Yes it can, but it would increas number of kcmp calls signisicantly. >> >> Actually it shouldn't. If you extend the kcmp argument to accept the >> epollfd:epollslot pair, this would be effectively the same as if you >> had all your epoll-ed files injected into your fdtable with "strange" >> fd numbers. We already have two-level rbtree for this in criu, adding >> extended ("strange") fd to it should be OK. > > Nope. Pavel, I guess you forget how we handle file tree in criu currently. > We call for kcmp only if we have to -- when primary key for two entries > is the same. True, but the latter is an optimization to reduce the number of syscalls. Look, in order to have a primary key you need to do some system call for the fd you check (read from proc or stat the descriptor). But for target files in e-polls you don't make this per-fd syscall to get primary key, just call the kcmp instead. -- Pavel
On Wed, Feb 22, 2017 at 11:09:23AM +0300, Pavel Emelyanov wrote: > >> > >> Actually it shouldn't. If you extend the kcmp argument to accept the > >> epollfd:epollslot pair, this would be effectively the same as if you > >> had all your epoll-ed files injected into your fdtable with "strange" > >> fd numbers. We already have two-level rbtree for this in criu, adding > >> extended ("strange") fd to it should be OK. > > > > Nope. Pavel, I guess you forget how we handle file tree in criu currently. > > We call for kcmp only if we have to -- when primary key for two entries > > is the same. > > True, but the latter is an optimization to reduce the number of syscalls. Exactly. While syscalls are quite effective, they are still not coming for free, so I'm trying to reduce their number as much as possible. > Look, in order to have a primary key you need to do some system call for the > fd you check (read from proc or stat the descriptor). But for target files in > e-polls you don't make this per-fd syscall to get primary key, just call the > kcmp instead. I have to parse fdinfo anyway, because I need to fetch queued events and mask. So I'll _have_ to make this per-fd syscall for parsing. And this opens a way to optimize overall picture -- we can immediately read primary key and reduce kcmp calls.
On 02/22/2017 11:18 AM, Cyrill Gorcunov wrote: > On Wed, Feb 22, 2017 at 11:09:23AM +0300, Pavel Emelyanov wrote: >>>> >>>> Actually it shouldn't. If you extend the kcmp argument to accept the >>>> epollfd:epollslot pair, this would be effectively the same as if you >>>> had all your epoll-ed files injected into your fdtable with "strange" >>>> fd numbers. We already have two-level rbtree for this in criu, adding >>>> extended ("strange") fd to it should be OK. >>> >>> Nope. Pavel, I guess you forget how we handle file tree in criu currently. >>> We call for kcmp only if we have to -- when primary key for two entries >>> is the same. >> >> True, but the latter is an optimization to reduce the number of syscalls. > > Exactly. While syscalls are quite effective, they are still not coming > for free, so I'm trying to reduce their number as much as possible. > >> Look, in order to have a primary key you need to do some system call for the >> fd you check (read from proc or stat the descriptor). But for target files in >> e-polls you don't make this per-fd syscall to get primary key, just call the >> kcmp instead. > > I have to parse fdinfo anyway, because I need to fetch queued events and mask. > So I'll _have_ to make this per-fd syscall for parsing. And this opens > a way to optimize overall picture -- we can immediately read primary > key and reduce kcmp calls. You read fdinfo per-epoll, but kcmp-s we're talking here are about per-target-files. So having dev:ino pair would help to reduce the number of kcmps, but even w/o this extension we can work OK. Besides, in most of the cases fd number you'd read from epoll's fdinfo will actually be present in task's fdtable, so you can call a single kcmp, make sure the file is correct and that's it. The need to actually _search_ for the runaway file with the set of kcmp will (should) be quite rare case. -- Pavel
On Wed, Feb 22, 2017 at 11:29:23AM +0300, Pavel Emelyanov wrote: > On 02/22/2017 11:18 AM, Cyrill Gorcunov wrote: > > On Wed, Feb 22, 2017 at 11:09:23AM +0300, Pavel Emelyanov wrote: > >>>> > >>>> Actually it shouldn't. If you extend the kcmp argument to accept the > >>>> epollfd:epollslot pair, this would be effectively the same as if you > >>>> had all your epoll-ed files injected into your fdtable with "strange" > >>>> fd numbers. We already have two-level rbtree for this in criu, adding > >>>> extended ("strange") fd to it should be OK. > >>> > >>> Nope. Pavel, I guess you forget how we handle file tree in criu currently. > >>> We call for kcmp only if we have to -- when primary key for two entries > >>> is the same. > >> > >> True, but the latter is an optimization to reduce the number of syscalls. > > > > Exactly. While syscalls are quite effective, they are still not coming > > for free, so I'm trying to reduce their number as much as possible. > > > >> Look, in order to have a primary key you need to do some system call for the > >> fd you check (read from proc or stat the descriptor). But for target files in > >> e-polls you don't make this per-fd syscall to get primary key, just call the > >> kcmp instead. > > > > I have to parse fdinfo anyway, because I need to fetch queued events and mask. > > So I'll _have_ to make this per-fd syscall for parsing. And this opens > > a way to optimize overall picture -- we can immediately read primary > > key and reduce kcmp calls. > > You read fdinfo per-epoll, but kcmp-s we're talking here are about per-target-files. > So having dev:ino pair would help to reduce the number of kcmps, but even w/o > this extension we can work OK. I didn't say we can't. But since we're reading fdinfo anyway it will help I don't see a single reason why should not we take this opportunity to speedup. > Besides, in most of the cases fd number you'd read from epoll's fdinfo will actually > be present in task's fdtable, so you can call a single kcmp, make sure the file is > correct and that's it. The need to actually _search_ for the runaway file with the > set of kcmp will (should) be quite rare case. Yes. But this rare cases are the reason why I started this series :( I would love to not add new code at all but simply had to.
Index: linux-ml.git/Documentation/filesystems/proc.txt =================================================================== --- linux-ml.git.orig/Documentation/filesystems/proc.txt +++ linux-ml.git/Documentation/filesystems/proc.txt @@ -1779,12 +1779,16 @@ pair provide additional information part pos: 0 flags: 02 mnt_id: 9 - tfd: 5 events: 1d data: ffffffffffffffff + tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7 where 'tfd' is a target file descriptor number in decimal form, 'events' is events mask being watched and the 'data' is data associated with a target [see epoll(7) for more details]. + The 'pos' is current offset of the target file in decimal form + [see lseek(2)], 'ino' and 'sdev' are inode and device numbers + where target file resides, all in hex format. + Fsnotify files ~~~~~~~~~~~~~~ For inotify files the format is the following Index: linux-ml.git/fs/eventpoll.c =================================================================== --- linux-ml.git.orig/fs/eventpoll.c +++ linux-ml.git/fs/eventpoll.c @@ -883,10 +883,14 @@ static void ep_show_fdinfo(struct seq_fi mutex_lock(&ep->mtx); for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { struct epitem *epi = rb_entry(rbp, struct epitem, rbn); + struct inode *inode = file_inode(epi->ffd.file); - seq_printf(m, "tfd: %8d events: %8x data: %16llx\n", + seq_printf(m, "tfd: %8d events: %8x data: %16llx " + " pos:%lli ino:%lx sdev:%x\n", epi->ffd.fd, epi->event.events, - (long long)epi->event.data); + (long long)epi->event.data, + (long long)epi->ffd.file->f_pos, + inode->i_ino, inode->i_sb->s_dev); if (seq_has_overflowed(m)) break; }
Since it is possbile to have same number in tfd field (say file added, closed, then nother file dup'ed to same number and added back) it is imposible to distinguish such target files solely by their numbers. Strictly speaking regular applications don't need to recognize these targets at all but for checkpoint/restore sake we need to collect targets to be able to push them back on restore in proper order. Thus lets add file position, inode and device number where this target lays. This three fields can be used as a primary key for sorting, and together with kcmp help CRIU can find out an exact file target (from the whole set of processes being checkpointed). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Al Viro <viro@zeniv.linux.org.uk> CC: Andrew Morton <akpm@linuxfoundation.org> CC: Andrey Vagin <avagin@openvz.org> CC: Pavel Emelyanov <xemul@virtuozzo.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> CC: Kir Kolyshkin <kir@openvz.org> CC: Jason Baron <jbaron@akamai.com> CC: Andy Lutomirski <luto@amacapital.net> --- Documentation/filesystems/proc.txt | 6 +++++- fs/eventpoll.c | 8 ++++++-- 2 files changed, 11 insertions(+), 3 deletions(-)